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ABSTRACT 

A  store  and  forward  computer  communication  network  may  concurrently 
support  more  than  one  type  of  user.  Because  of  different  user  charac¬ 
teristics,  the  control  of  each  type  of  user  may  be  done  by  different  mecha 
nisms.  Moreover,  for  each  type  of  user  there  may  be  more  than  one  control 
mechanism  operating  concurrently.  Algorithms  previously  proposed  for  user 
mechanism  pairs  were  analysed  independently  ignoring  the  significant 
interactions  among  them. 

In  this  thesis,  we  consider  the  voice  user  and  the  data  user,  and  the 
routing  mechanism  and  the  flow  control  mechanism.  We  propose  four  algo¬ 
rithms,  one  for  each  user-mechanism  pair.  We  show  that  these  algorithms 
are  compatible  in  the  sense  that  they  can  be  coordinated  to  achieve  some 
reasonable  objective  when  operating  concurrently.  Moreover,  some  of  the 
algorithms  are  superior  in  some  aspects  to  existing  ones. 
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1.  Introduction 


1.1  Problem  Definition 

The  concept  of  store  and  forward  operation  for  communication  networks 
was  introduced  in  order  to  increase  efficiency  by  sharing  network 
resources,  but  this  efficiency  decreases  rapidly  when  the  network  becomes 
congested.  Because  the  nodal  storage  capacity,  which  is  needed  for  the 
store  and  forward  operation,  is  limited,  a  node  cannot  accept  a  packet  when 
its  storage  area  is  full.  Consequently,  a  node  whose  storage  area  is  full 
may  cause  a  fill  up  of  the  storage  area  in  adjacent  nodes  inhibited  from 
forwarding  packets  to  it.  It  is  not  hard  to  imagine  how  this  fill-up 
might  propagate,  causing  the  network  to  enter  virtual  standstill,  with  each 
node  waiting  for  the  other,  as  input  packets  try  to  enter  the  network  and 
none  succeeds. 

Routing  and  flow  control  are  two  mechanisms  to  avoid  congestion.  To 
understand  the  exact  role  of  each  of  them,  it  is  helpful  to  view  a  com¬ 
munication  network  as  a  network  of  queues  in  which  the  server  has  to  devote 
part  of  his  service  capacity  to  the  control  of  the  queue.  Assume  that 
the  part  of  the  service  capacity  that  goes  for  the  control  of  the  queue 
is  proportional  to  the  number  of  customers  in  the  queue.  Thus  as  the 
number  of  customers  in  the  queue  increases  the  actual  service  rate 
decreases.  If  the  number  of  customers  in  the  queue  exceeds  some  threshold 
the  actual  service  rate  drops  to  zero  and  the  queue  is  deadlocked. 

The  routing  in  the  context  of  this  analogy,  tries  to  avoid  a  queue 
deadlock  and  to  increase  the  actual  service  rate  by  distributing  the 
streams  of  the  incoming  flow  among  many  queues.  In  this  way  the  effective 
arrival  rate  at  each  queue  within  the  network  is  low,  allowing  a  higher 
fraction  of  the  server  time  to  be  devoted  to  an  actual  service  and  reducing 
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the  probability  of  a  queue  deadlocking.  Moreover,  in  this  way  a  packet  is 
cleared  from  the  network  faster,  reducing  the  total  resources  consumed  by 
i  t. 

But,  routing  can  only  help  in  reducing  the  apriori  chance  of  any  queue 
deadlocking.  It  cannot  totally  prevent  it.  This  is  due  to  the  nondeter- 
ministic  arrival  of  packets  to  the  network,  and  to  the  nondeterministic 
manner  in  which  a  packet  is  forwarded  in  the  network. 

Prevention  of  this  chance  is  the  task  of  the  flow  control  mechanism. 

In  particular,  end  to  end  and  flow-control  curtails  the  flow  directed  to  the 
congested  area  until  the  congestion  subsides.  However,  in  many  instances 
end  to  end  flow-control  will  not  be  enough  and  it  will  be  operated  in 
conjunction  with  another  mechanism  which  controls  packet  forwarding  inside 
the  network. 

That  routing  and  flow  control  have  the  same  role  of  preventing 
congestion  makes  it  hard  to  distinguish  where  routing  ends,and  end  to  end 
flow-control  starts.  A  routing  algorithm  which  reacts  to  the  network  state 
provides  a  measure  of  flow  control  by  aposteriori  diverting  flow  from 
congested  to  uncongested  areas;  a  good  flow  control  algorithm  provides 
rerouting  of  flow  by  allowing  higher  flows  through  uncongested  areas  while 
curtailing  the  flow  to  congested  ones. 

Traditionally,  in  order  to  understand  the  exact  role  of,  and  improve 
or.,  schemes  of  congestion  prevention,  routing  and  flow  control  have  been 
dealt  with  separately.  When  dealing  with  routing,  inputs  are  usually 
considered  fixed;  when  dealing  with  flow  control,  fixed  routing  has  been 
assumed.  But,  in  view  of  the  congestion  prevention  objective  that  both 
routing  and  flow-control  try  to  achieve,  they  ultimately  should  be  analysed 
as  one  mechanism.  In  making  routing  changes,  the  effect  of  the  flow- 
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control  on  the  inputs  should  be  taken  into  account;  while  when  changing 
flow-control  parameters  the  reaction  of  the  routing  mechanism  to  the 
changing  input  should  be  considered. 

Many  routing  and  flow-control  algorithms  that  were  suggested  when 
viewing  each  of  them  separately,  turned  out  to  be  incompatible  when  viewed 
as  a  single  system.  The  usual  problem  is  that  when  the  flow-control  para¬ 
meters  remain  fixed,  a  routing  change  causes  an  input  change.  This  routing 
change  which  can  be  shown  to  achieve  a  reasonable  objective  were  the  inputs 
to  remain  fixed,  might  achieve  the  opposite  objective  when  the  inputs  are 
subject  to  the  flow-control  mechanism. 

Recently,  the  interest  in  providing  the  store  and  forward  communication 
network  with  the  capability  to  accommodate  the  conversational -voice  customer 
has  mounted.  Like  a  computer  that  waits  for  a  new  request  after  dumping  a 
chunk  of  information,  or  like  the  terminal  user  who  has  to  type  in  a  new 
line  after  transmitting  the  previous  one,  so  too,  a  user,  who  has  finished 
speaking  to  another,  either  listens  to  the  other's  response  or  pauses 
before  going  on.  Inasmuch  as  the  voice  user  does  not  utilize  the  network 
continually  at  the  same  rate,  the  store  and  forward  concept  is  also 
suitable  for  him.  The  advent  of  the  technology  of  packetizing  voice  has 
given  an  incentive  to  research  on  the  particular  problems  that  may  result 
from  incorporating  the  voice  user  into  the  network. 

The  problem  of  congestion  and  the  mechanisms  of  routing  and  flow 
control  should  apply  equally  well  to  different  types  of  network  users,  such 
as  voice  and  data,  since  it  is  only  the  notion  of  statistical  multiplexing, 
as  in  a  computer  network  implemented  by  the  store  and  forward  method,  which 
gives  rise  to  these  problems  and  mechanisms.  Nevertheless,  different 
algorithms  for  routing  and  flow  control  are  called  for  because  of  the 
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different  performance  measures  that  characterize  the  delivery  of  data  as 
opposed  to  voice.  While  data  is  permitted  to  experience  long  delay,  its 
fidelity  cannot  be  subject  to  compromise  (i.e.,  errorless  delivery);  voice, 
on  the  other  hand,  is  permitted  to  undergo  relative  degradation  of  fide¬ 
lity,  but  it  can  tolerate  only  short  delay  if  the  continuity  of  the  conver¬ 
sation  is  not  to  be  Impaired. 

Thus,  on  top  of  the  need  to  design  voice  flow»control  and  routing 
algorithms  that  are  compatible  and  operate  properly  as  one  system,  we  are 
faced  with  the  problem  of  interaction  between  two  different  congestion  pre¬ 
vention  systems,  operating  in  the  same  network. 

In  this  Thesis  we  set  out  to  come  up  with  algorithms,  and  improve  on 
existing  algorithms  in  the  four  areas  of  data  routing,  data  flow  control, 
voice  routing  and  voice  flow  control.  Though  some  of  th;  improvements  are 
quite  elaborate,  their  motivation  and  the  direction  they  took,  were  with  an 
eye  toward  the  ultimate  goal  of  operating  them  concurrently  in  the  same 
network. 

1.2  Summary  of  Previous  Work 
1.2.1  Routi ng 

Routing  algorithms  have  been  categorized  as  static,  quasi -static  or 
dynamic,  and  centralized  or  distributed.  Attempts  to  route  dynamically, 
i.e.,  a  routing  based  on  the  instantaneous  state  of  queues  in  the  network, 
are  few  and  not  highly  successful  [45],  [46].  The  difficulty  stems  from  the 
need  for  almost  instantaneous  communication  and  computation  on  the  one 
hand,  and  very  complex  computation  and  control  on  the  other.  The  static 
algorithms,  by  which  inputs  are  fixed  and  one  deals  with  flows  rather  than 
queues,  were  applied  to  communication  networks  after  Kleinrock  formulated 
routing  as  a  multicommodity  flow  problem  [19].  These  algorithms  are  useful 
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in  various  applications  as  well  as  in  communication  networks  (see  [20]). 
Most  of  them  are  centralized  and  have  a  low  rate  of  convergence. 

Not  long  ago,  Gallager  came  up  with  the  notion  of  quasi -static  routing 
[10].  Such  a  routing  is  done  by  an  algorithm  which  operates  "on  line".  It 
measures  the  current  traffic  parameters,  carries  out  one  or  more  iterations 
of  a  static  algorithm  using  these  parameters,  implements  the  resulting 
routing  variables  on  the  networks,  measures  the  resulting  traffic 
parameters,  and  so  on.  If  inputs  were  fixed,  the  algorithm  would  converge, 
while  for  slowly  varying  inputs  the  hope  is  that  the  algorithm  will  be  able 
to  "track"  the  variations  and  keep  routing  not  far  from  optimal  at  all 
ti mes . 

Every  static  algorithm  whose  intermediate  results  improve  routing  and 
whose  parameters  are  independent  of  the  iteration  number  or  the  input 
levels,  can  serve  as  a  quasi-static  algorithm.  The  particular  algorithm 
suggested  by  Gallager  had  the  feature  of  distributed  computation,  and  the 
routing  variables  used  were  fractions  of  flows  decomposed  by  destinations. 
Gallager's  algorithm  has  the  disadvantages  of  slow  convergence,  and  the 
need  for  a  stepsize  parameter,  the  proper  value  of  which  depends  on  the 
input  levels. 

In  Bertsekas  _et  a]_.  [11]  and  O'Leary  [23],  improvements  to  Gallager's 

algorithm  were  suggested  by  introducing  second  derivative  information. 

These  algorithms  tried  to  approximate  Newton's  method.  Still,  by  upper 
bounding  or  neglecting  Hessian  cross  terms,  the  problem  of  slow  convergence 
persisted.  Similar  algorithms  which  operate  in  the  space  of  path  flov/s 
were  suggested  by  Aashtiani  [24]  and  Bertsekas  [25]. 

All  of  the  preceding  algorithms  tried  to  minimize  an  objective  func¬ 
tion,  consisting  of  average  delay  or  equivalently  the  number  of  outstanding 


11 


packets  in  the  network.  (More  realistically,  the  objective  function  is 
some  crude  measure  of  congestion.)  Another  objective  function  which  tries 
to  alleviate  bottleneck  congestion  leads  to  a  formulation  of  the  routing 
problem  as  a  min-max  linear  program.  Ros  Peran  [26]  investigated  the 
implementation  of  the  simplex  algorithm  for  this  problem.  Defenderfer  [24] 
and  Vastola  [29]  tried  to  solve  the  linear  problem  by  a  sequence  of  non¬ 
linear  ones,  achieving  shared  computation.  But  both  algorithms  are  static; 
both  have  to  carry  subproblem  optimization  to  which  they  have  no  bound  to 
the  number  of  iterations  (i.e.,  no  a  priori  knowledge  of  the  amount  of 
computation  involved  in  one  basic  iteration).  Computational  results  by 
Vastola  [29]  indicate  that  the  routings  resulting  form  the  different 
objective  functions  are  quite  close. 

Most  of  the  research  in  routing  has  been  done  in  the  context  of  data. 
When  voice  is  involved,  the  strict  continuity  requirements  on  the  packet 
delivery  makes  routing  on  a  virtual  circuit  almost  a  must.  Routing  on 
virtual  circuits  falls  in  the  domain  of  integer  programming  for  which 
analytical  techniques  are  not  as  well  developed  as  those  for  noninteger 
programming.  But,  when  many  conversations  between  each  pair  of  nodes  are 
assumed,  one  may  approximate  the  problem  by  a  conti nous  one  [30]. 

1.2.2  FI  ow -Control 

Most  of  the  flow  control  schemes  are  ad-hoc  and  hard  to  analyze.  Many 
of  them  are  buffer  management  schemes  that  depend  on  statistical  variables 
such  as  queue  length  and  need  to  be  analyzed  for  a  particular  buffer 
length.  Since  we  are  interested  in  results  that  are  general  in  nature  and 
compatible  with  quasi -static  muting,  we  focus  attention  on  flow-control 
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algorithms  that  will  curtail  inputs  even  when  nodal  buffers  are  of  infinite 
length. 

Three  such  schemes  are  all  tied  to  the  notion  of  "permit".  The  first, 
the  window  scheme  [21],  restricts  the  number  of  permits  for  each  origin- 
destination  (OD)  pair.  The  second,  the  isarithmic  scheme  [21],  restricts 
the  number  of  permits  only  for  the  network  as  a  whole,  and  the  third  [22] 
uses  queues  of  permits  for  each  origin-destination  pair. 

The  window  scheme  is  implemented  in  today's  ARPANET.  Each  origin- 
destination  pair  keeps  an  account  of  the  packets  sent.  An  origin  sending 
a  packet  incurs  a  unit  debt  until  the  packet  is  acknowledged  by  the  destina¬ 
tion.  Each  00  pair  is  assigned  a  number,  called  a  "window",  of  the  maximum 
debt  the  source  may  incur.  When  congestion  builds  up,  the  acknowledgement 
arrival  slows  down  and,  as  a  result,  the  input  rate  of  the  source  drops. 

When  the  number  of  conversations  is  too  large,  this  drop  may  not  be  suf¬ 
ficient  to  alleviate  the  congestion. 

The  isarithmic  scheme  was  proposed  for  the  National  Physical 
Laboratory  network.  A  network  is  allocated  a  number  of  "permits",  which 
are  just  small  packets.  The  permits  circulate  randomly  in  the  network.  A 
packet  may  enter  the  network  only  if  it  "captures"  a  permit.  When  the 
packet  arrives  at  the  destination,  the  permit  is  released  back  to  circulate 
in  the  network.  Obviously,  this  scheme  restricts  the  total  number  of 
outstanding  packets  in  the  network.  There  are  two  major  difficulties  with 
this  scheme:  controlling  the  distribution  of  permits  in  the  network  so  that 
a  single  OD  pair  will  not  capture  an  "unfair"  share  of  permits,  and 
guarding  against  unintentional  generation  or  loss  of  permits. 

The  third  scheme  was  proposed  recently  by  Kleinrock  and  Tseng  [22].  In 
this  scheme,  each  origin  has  a  finite,  usually  small,  buffer  ^or  permits. 
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Permits  are  generated  in  some  prescribed  Poisson  rate,  and  are  queued  in 
the  buffer.  A  permit  which  finds  the  buffer  full  is  destroyed.  A  packet 
may  enter  the  network  after  it  captures  a  permit  from  the  appropriate 
queue.  That  captured  permit  is  destroyed.  When  a  source  tries  to  increase 
its  rate  too  much,  many  packets  will  find  empty  permit  queue  and  the  rate 
will  oe  restricted.  This  scheme  is  highly  non-adaptive.  It  reacts  to 
inputs  but  not  to  congestion. 

In  the  case  of  voice,  control  must  be  much  tighter,  because  only  very 
small  delays  can  be  tolerated.  Bially  et  [31]  initiated  a  voice  flow 
control  scheme  that  is  based  on  trading  voice  quality  with  packet  length. 
Essentially,  a  packet  can  be  shortened,  maintaining  intelligibility, 
although  at  a  lower  quality.  In  this  particular  suggestion,  packets 
travelling  on  a  congested  link  were  shortened  at  the  link's  input,  thus 
relieving  the  congestion.  The  waste  in  netv/ork  resources  caused  by 
shortening  a  long  packet  after  traversing  some  links  is  apparent. 

Jaffe  [32]  and  Hayden  [33]  independently  suggested  end  to  end  schemes 
that  overcome  this  difficulty.  In  essence,  in  both  schemes  the  link  which 
is  the  bottleneck  divides  its  resources  equally  among  all  conversation 
traversing  it.  Hayden's  schemes,  unlike  Jaffe's,  is  suitable  for 
quasi-static  operation,  but,  unfortunately ,  it  suffers  from  bad  transient 
behavior.  In  order  to  make  this  behavior  less  violent,  large  "safety 
coefficients"  must  be  employed,  thus  wasting  resources. 

1.2.3  Combined  Routing  and  Flow-Control 

In  both  attempts  to  analyze  flow-control  and  routing  together, 
discussed  below,  flow-control  was  done  using  the  window  scheme. 


14 


In  Gerla  and  Nilsson  [34],  minimum  delay  routing  was  analyzed  in 
conjunction  with  fixed  windows.  It  is  claimed  that  minimum  delay  routing 
will  maximize  throughput.  An  algorithm  is  proposed  to  find  the  optimum, 
but  no  convergence  proof  is  given. 

Golestaani  [35]  was  the  first  to  view  flow  control  and  routing  as  a 
combined  optimization  problem  in  the  space  of  routing  together  with  inputs. 
In  his  scheme,  after  better  routing  and  inputs  are  found,  flow  control 
variables  which  will  give  rise  to  the  desired  inputs  are  to  be  computed. 
This  obviously  assumes  the  knowledge  of  the  delay  function,  an  assumption 
that  might  be  objectionable. 

Gal  lager  [12]  has  tried  to  dispense  with  the  above  assumption  by  using 
Golestaani' s  framework,  but  operating  directly  in  the  space  of  routing  and 
flow  control  variables.  The  particular  algorithm  suggested  is  not  complete 
in  that  a  descent  iteration  is  given  that  adjusts  only  the  windown  size 
variables. 

Realizing  this  difficulty  Gallager  [36]  suggested  the  use  of  a  window 
for  every  pair  of  session-path.  In  such  a  case,  the  windows  determine  the 
path  flows.  This  allows  us  to  achieve  routing  and  flow  control  together 
using  only  windows  as  control  variables  and  thus  avoiding  the  difficulty 
mentioned  above.  Still,  the  resulting  algorithm  has  a  poor  rate  of 
convergence.  A  more  important  side  effect  that  followed  from  the  above 
realization,  was  the  observation  made  there  [36],  that  a  window  for  every 
pair  of  session-path  may  be  the  way  to  exercise  dynamic  routing. 

1.2.4  Integrated  Voice  and  Data  Network 

Some  suggestions  have  been  made  as  to  the  way  of  achieving  this 
integration  [37],  [42] .  Recently,  Ibe  [38]  proposed  the  modeling  of  a 
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store  and  forward  integrated  network  with  voice  packets  having  priority 
over  data  packets.  This  is  done  by  considering  voice  and  data  as 
different  "modes".  It  essentially  transforms  the  network  into  one  that 
consists  of  two  duplications  of  the  original  network.  Ont  duplicate 
carries  the  voice,  the  other  the  data.  Furthermore,  the  delay  on  a  given 
link  is  also  affected  by  the  flow  on  the  duplicate  one.  Ibe  then  uses 
Golestaani's  scheme  for  the  transformed  network. 


Summary  of  Accomplishments 


1.3.1  Introduction 

In  the  chapters  that  follow  we  propose  four  algorithms,  two  for 
routing  and  two  for  flow  control.  Which  algorithm  to  choose  for  which  user 
or  network  depends  mainly  on  the  underlying  rules  by  which  a  particular 
network  serves  a  particular  user.  Such  rules  for  example  might  require  the 
use  of  a  virtual  circuit  per  session  or  might  require  that  once  a  session 
is  in  progress  no  rerouting  will  be  done. 

A  basic  distinction  among  the  algorithms  is  whether  they  are  system 
or  user  oriented.  The  former  takes  a  global  objective  function  and 
minimizes  it.  Put  another  way,  it  takes  into  account  the  amount  of  network 
resources  a  use*"  utilizes.  The  latter  is  more  concerned  with  the  resulting 
service  the  end  user  perceives.  THs  resulting  service  should  have  a 
fairness  property  to  it.  A  generic  example  which  brings  forth  the  dif¬ 
ference  between  the  two,  is  the  network  deoicted  in  Figure  (l.A) 
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There  are  seven  users  in  the  network.  Six  of  them  use  only  one  link  each, 
and  user  1  uses  all  the  six  links.  Let  all  link  capacities  be  equal. 
Obviously  if  the  network  is  highly  utilized  an  increase  in  the  a  service 
to  user  1  will  come  at  the  expense  of  the  detrioriation  of  service  to  the 
other  six  users.  A  system  oriented  algorithm  will  take  this  into  account 
when  allocating  resources,  while  a  user  oriented  algorithm  will  not. 

In  this  thesis  we  take  the  point  of  view  that  the  different  perfor¬ 
mance  measures  by  which  the  delivery  of  voice  as  opposed  to  date  is  charac¬ 
terized,  as  well  as  the  underlying  network  rules  that  might  be  implied, 
make  the  user  oriented  algorithms  more  suitable  for  voice  and  the  others 
more  suitable  for  data. 

The  reader  is  encouraged  to  look  in  each  chapter  for  the  abstract 
setting  in  which  the  chapter  is  cast,  and  view  the  voice  or  the  data 
framework  only  as  a  possible  instance  of  a  user  for  whom  this  setting  is 
suitable. 

1.3.2  Voice  Routing 

We  consider  routing  each  voice  session  on  a  virtual  circuit.  The 
virtual  circuit  is  fixed  throughout  the  duration  of  the  session.  We  show 
that  provided  the  capacity  required  by  each  session  is  relatively  small, 
placing  an  incoming  session  on  the  path  of  minimal  marginal  cost  is  optimal 
in  an  asymptotic  sense.  Moreover  we  consider  the  case  where  the  path  is 
not  updated  upon  every  new  session  arrival  but  is  updated  periodically.  We 
show  that  this  updating  rule  is  also  optimal  in  an  asymptotic  sense. 

1.3.3  Voice  Flow  Control 

We  propose  a  bottleneck  algorithm  similar  to  Hayden's  and  Jaffe's. 

We  show  the  difficulty  that  Hayden's  algorithm  encounters,  and  the  reason 
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ours  does  not.  We  prove  convergence  and  extend  the  algorithm  beyond 
Jaffe's,  in  that  we  are  able  to  handle  nonlinearities. 

1.3.4  Data  Routing 

We  exploit  the  simple  structure  of  the  Hessian  matrix  of  the  cost 
function  with  respect  to  path  flows  as  opposed  to  the  space  of  fractions. 
This  structure  allows  for  the  distributed  multiplication  of  the  Hessian 
matrix  by  a  vector.  Hence  we  show  that  we  can  compute  the  Newton  direction 
and  the  Newton  stepsize  distributedly  for  the  relaxed  problem  in  which  tne 
positivity  constraints  on  the  path  flows  are  removed. 

We  propose  an  algorithm  by  which  we  essentially  project  the  relaxed 
direction  on  the  constrained  set.  This  projection  can  be  done  with  respect 
to  a  metric  which  matches  the  simplex  constraints  involved  in  the 
multicommodity  flow  problem.  The  algorithm  is  a  general izaton  of  the  one 
proposed  by  Bertsekas  [6]  for  an  orthant  constraint. 

We  prove  that  the  algorithm  converges  superli nearly  with  a  stepsize  of 
unity  when  started  close  enough  to  the  optimum,  similar  to  the  pure  Newton 
method.  Together  with  the  possibility  of  computing  the  Newton  direction 
distributedly  this  makes  the  algorithm  a  prime  candidate  for  quasi -static 
operation. 

We  state  and  analyze  the  algorithm  in  complete  generality  and  then 
specialize  to  the  multicommodity  problem.  In  addition,  we  present 
computational  results  based  on  three  sample  metworks. 

1.3.5  Data  Flow-Control 

We  propose  an  iterative  algorithm  to  adjust  the  parameters  of  the 
window  flow-control  scheme  to  achieve  a  desired  input  rate.  This 
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algorithm,  based  on  an  observation  made  by  Gallager  [12],  Is  guaranteed  to 
converge  under  mild  assumptions  concerning  the  relation  between  link  flow 
and  link  delay,  and  does  not  require  the  knowledge  of  this  relation. 

Furthermore,  we  propose  a  model  to  Investigate  the  effect  of  flow 
control  on  routing.  Using  this  model,  we  show  that  the  assertion  made  In 
[36]  about  accomplishing  dynamic  routing  using  flow-control  mechanisms,  is 
plausible. 

1.3.6  Integrated  Network 

We  propose  a  way  to  Integrate  the  four  algorithms  suggested  in  this 
Thesis  In  a  network  serving  voice  and  data  simultaneously  The  joint  flow- 
control  and  routing  for  data  Is  made  In  the  spirit  of  the  out  proposed  by 
Golestaani  [35].  The  joint  flow-control  and  routing  for  voice  Is  made  In  a 
way  that  achieves  a  fair  allocation  of  rates  over  the  capacity  allocated  to 
it.  We  integrate  the  two  joint  flow-control  and  routing  algorithms  by 
giving  priority  to  voice  and  proposing  an  algorithmic  method  to  dynami¬ 
cally  allocate  capacity  to  the  voice  so  as  to  achieve  a  tradeoff  between 
voice  rate,  and  data  rate  and  delay.  In  some  sense,  this  is  similar  to  the 
algorithm  proposed  by  Ibe  [38]. 
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* 

2  Voice  touting 
2.1  Introduction 

In  this  chapter  we  consider  a  network  in  wMch  voice  sessions  are 
taking  place  between  various  origin-destination  (or  OD)  pairs  of  nodes. 

Each  actual  two-party  voice  conversation  is  treated  as  two  sessions. 

Each  session  is  allocated  a  virtual -circuit  (VC)  at  the  time  of  setup, 
which  lasts  for  the  whole  duration  of  the  session.  All  sessions  com¬ 
municate  at  a  certain  fixed  rate.  The  routing  problem  we  address  is  to 
allocate  paths  to  sessions  at  the  time  of  set  up,  so  as  to  minimize  a  cer¬ 
tain  objective  function. 

Specifically,  we  have  in  mind  a  large,  ARPANET  like,  network  in  which 
an  individual  voice  session  is  not  significant.  It  is  rather  the  sessions 
as  a  group  that  affect  the  network.  More  precisely,  we  assume  that  the 
communication  capacity  utilized  by  each  session  is  small  relative  to  the 
link  capacity  provided  by  the  network;  yet,  the  number  of  sessions  between 
each  OD  pair  is  large  enough  to  make  the  cumulative  rate  of  communication 
between  OD  pairs  significant. 

We  will  analyze  a  routing  rule  tailored  to  this  context.  By  this  rule, 
all  Incoming  sessions  in  an  interval  of  time  are  assigned  a  fixed  path. 

This  path  is  changed  periodically  in  order  to  adapt  to  the  changing  flow 
pattern  in  the  network,  as  old  sessions  terminate  and  new  sessions  arrive. 
Thus  this  rule  avoids  long  computation  or  the  fast  measurements  that  would 
be  needed  by  a  routing  rule  which  would  try  to  react  to  sessions  on  a  more 
"individual"  basis.  Taking  into  account  the  fast  rate  of  arrival  we  envi- 

*The  problem  in  this  chapter  was  suggested  by  A.  Segal  1 
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sage,  it  is  apparent  that  doing  session  by  session  routing  decisions  is 
impractical . 

The  resulting  situation  is  similar  to  the  quasi-static  routing  of 
Chapter  4.  The  hope  is  that  the  variation  of  the  flow  pattern  in  the 
network  affected  by  the  terminating  and  arriving  sessions  is  slow  enough, 
relative  to  the  update  period,  so  that  our  rule  reacts  fast  enough  to 
keep  the  flow  not  far  from  the  optimum  at  all  times. 

In  the  next  section  we  present  the  objective  to  be  achieved  by  the 
routing  rule.  We  discuss  the  difficulty  of  analyzing  any  particular  instance 
of  the  problem  and  subsequently  motivate  the  analysis  of  a  qualitative  asymp¬ 
totic  behaviour.  In  the  section  that  follows  we  present  the  routing  rule, 
and  in  the  last  section  we  prove  the  result  about  its  asymptotic  behaviour. 
2.2  The  Voice  Routing  Problem 

We  consider  a  network  consisting  of  N  nodes  1,2 . N  and  a  set  of 

directed  links  denoted  by  z.  We  are  given  a  set  W  of  OD  pairs.  For  each 

OD  pair  weW,  we  are  given  a  set  of  directed  paths  P  that  start  at  the 

w 

origin  node  and  terminate  at  the  destination  node.  Let  X  denote  the  mean 

w 

rate  of  session  arrivals  at  OD  pair  weW,  let  y  denote  tne  communication 
rate  required  by  each  session,  and  let  each  session  have  an  average  holding 
time  y  .  It  is  assumed  that  each  of  the  arrival  processes  is  an  indepen¬ 
dent  Poisson  process,  and  the  holding  time  of  each  session  is  an  indepen¬ 
dent  exponentially  distributed  random  variable. 

Let  the  random  variable  xp(t)  denote  the  number  of  sessions  between  OD 

pair  weW,  traversing  path  peP  at  time  t.  To  describe  the  evolution  of 

w 

xp(t)  let 


{ex(s))xe[0,») 


and 


e^f  s) 


i  1,2,... , 


T  pePw,  weW,  Y  s>0 
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be  a  collection  of  independent  stochastic  processes  taking  value  in  {0,1} 


and  satisfying  for  all  t^  >  t  >0 


l>l«J(t1)*0/e«(t2)-0].p[^P(t1)-0/eP(t2)-0]-l 

7  Xe[0,«),  i=l,...,xP(0)  pePw,  weW. 

Assume  that  at  time  zero  the  value  of  all  the  processes  is  one.  Let  A  (t)  be 

w 

the  counting  process  describing  session  arrivals  at  weW,  and  let  our  control 

be  the  vector  random  variable  U(t)  whose  entries  uP(t)  peP  ,  weW  satisfy 

w 

uP(t)e{0,l}  7  pePw,  weW,  7  t>0  (2.2.1) 

l  ”P(t)  =1  7  weW  7  t>0.  (2.2.2) 

p£Pw 

Then,  for  ail  peP  ,  we  define  xP(t)  by 
w 

xP(0)  t 

x  {t)  =  l  ei(t)  +  /  u  (s)«e?(t-s)d(Aw(s))  7  weW  7  t>0.  (2.2.4) 

i  =  l  0 


Notice  that  if  X  denotes  the  set  defined  by  (2.2.1)  -  (2.2.2)  then  because 
U(t)  eX 

p  xP<°) _p  w 

I  X  (t)  -  l  X  ‘5i(t)  +  L  es(t-s)d(Aw(s))  7  weW,  7  t>0  (2.2.3) 

peP  peP  1=1  1  0  3 

w  r  w 

and,  thus,  the  total  number  of  sessions  between  any  0D  pair  at  any  time  is 
independent  of  our  control. 

For  every  link  ae£  we  can  determine  fa(t),  the  flow  on  link  ae£  at  time 
t,  by  the  relation 
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f3(t)  =  l  l  1  (a)xP(tb  ■*  ae£  (2.2.5) 

WeW  peP  P 
w 

where  1  (a ) =1  if  path  p  contains  the  link  a  and  1  (a ) =0  otherise.  If  we 
P  p 

denote  by  x(t)  and  f(t)  the  vectors  of  the  number  of  path  sessions  and  link 
flows  respectively,  at  time  t,  then  we  can  write  relation  (2.2.5)  as 

f(t)  =  'Ex(t)  •  y  (2.2.6) 


where  T  is  the  arc-chain  matrix  of  the  network. 

In  modeling  a  packet  switched  network  when  y-1  is  long  enough  equation 
(2.2.6)  will  hold  as  an  average  over  an  interval  of  time  around  t.  As  will 
be  described  shortly,  we  are  concerned  with  an  "average"  behaviour  objec¬ 
tive.  In  light  of  this  objective  writing  (2.2.6)  as  an  instantaneous  rela¬ 
tion  is  justified. 

It  is  customary  to  take 

0(f)  =  I  Da[fa(t)]  Y  t>0 

aef 


to  be  an  instantaneous  measure  of  congestion  in  the  network,  [35],  where 
( • )  is  a  certain  function  to  be  specified  later.  Thus,  a  reasonable 
overall  objective  to  be  achieved  by  the  voice  routing  is  to  minimize 


1  T 

lim  E  T  /  D[f (t)]dt 

T+oo  g 


(2.2.7) 


over  all  U( * )  such  that  U(t)eX  where  f(t)  satisfies  (2.2.5)  and  (2.2.3). 

Even  a  very  simple  instance  of  problem  (2.2.7),  in  which  sessions 
arrive  simultaneously  at  different  0D  pairs,  one  session  at  a  certain  0D 
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pair  and  all  the  others  at  another,  with  no  later  arrivals  expected,  is  a 
very  difficult  problem.  Indeed,  by  taking  an  appropriate  function  for 
D  (•),  a<-£  it  can  be  shown  that  the  problem  we  face  is  equivalent  to  the 

d 

two  commodity  flow  problem  shown  by  Even,  Itai  and  Shamir  [28]  to  be  an 
NP-complete  problem  [42]. 

Nonsimplified  instances  of  (2.2.7)  are  more  difficult  by  an  order  of 
magnitude.  Not  only  the  "interaction"  between  paths  are  to  be  taken  into 
account,  but  the  different  rate  of  arrivals  to  different  OD  pairs  should 
play  their  role,  too. 

Yet,  the  simple  instance  of  problem  (2.2.7)  above  has  a  simple  solution 

if  the  rate  of  communication  of  a  session,  y.  is  taken  to  be  small  enough. 

The  optimal  routing  decision  in  this  case  is  to  allocate  the  paths  which 

yield  minimal  marginal  costs,  i.t.  shortest  paths  with  respect  to  D'(  ), 

61 

ae£.  Since  this  is  exactly  the  situation  we  have  in  mind,  it  is  expected 
that  routing  on  the  marginal  delay  shortest  paths  is  "robust"  in  the  sense 
that  it  will  yield  a  cost  close  to  the  optimal,  even  for  the  nonsimplified 
problem. 

As  for  the  choice  of  0  (fa),  ae£,  in  [35]  the  function 

a 

D  (fa)  = - - -  (2.2.8) 

a  C  -  fa 

a 

is  suggeci.-d  for  data.  In  Chapter  6  we  will  motivate  another  function 

suitable  for  voice.  Noticing  from  (2.2.4)  that  we  have  no  control  over  the 

total  flow  emanating  from  an  origin,  this  choice  of  D  (fa)  will  result  in 

61 

cost  of  infinity  in  (2.2.7)  independent  of  the  routing.  This  is  due  to  the 
nonzero  probability  that  the  number  of  sessions  currently  in  progress 
exceeds  the  network  capacity.  In  practict,  when  congestion  builds  up,  the 
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flow  control  mechanism  will  block  new  sessions  from  entering  the  network. 
Thus  in  order  to  deal  with  a  meaningful  routing  problem,  and  yet,  avoid  the 
modeling  of  the  flow  control  mechanism,  we  take  D  (fa),  as*  to  be  a  quadra- 
tic  extrapolation  to  (2.2.8)  by  making  the  following  assumption  concerning 

DM: 

a 

Assumption  (2. A):  For  all  as*,  D  (•)  is  convex  twice  continuously  differ- 

d 

entiable  increasing  on  [o,»)  with 

Da(0)  >  _n  Y  as* 

and 

°  c  n  <  D"(fa)  <  n  <  -  Y  fae[0,«),  Y  as*. 


Evi dently ,  any  quadratic  extrapolation  to  (2.2.8)  satisfies  Assumption 

(2. A). 

2.3  The  Algorithm  for  Voice  Routing 

Following  the  observation  made  in  the  previous  section  we  propose  this 
routing  rule:  Every  At  units  of  time  each  00  pair  updates  its  shortest 
path  with  respect  to  0  (fa(t))  as*.  In  between  updates  all  new  incoming 
sessions  are  routed  on  the  most  recent  shortest  path,  while  a  session  which 
has  arrived  before,  continue  to  use  the  path  allocated  to  it  upon  arrival, 
until  it  terminates. 

Mathematically,  we  choose  U(t)sX,  with  QP ( t )  satisfying  for  pePw 


where 


/ 


{0} 


if 


p  =  arg  min 
PePw 
otherwi se 


(2.3.1) 


dp(t)  =  l  o'(fa(t))-l  (a) 
as*  a  P 


(2.3.2) 
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and  l_z  J  denotes  the  largest  integer  smaller  than  z. 

With  such  a  control  the  pair 

{x(kAt  +  5  ) ,x(kAt) }  r  0  <  5  <  At  (2.3.3) 

becomes  an  imbedded  Markov  process.  Moreover,  it  is  not  difficult  to  see 
that  it  is  positive  recurrent  and  as  a  result  a  steady  state  distribution 
of  x(t)  exists.  This  implies  that  with  the  control  U(  • )  the  limit  in 
problem  (2.2.7)  is  well  defined. 

The  kind  of  result  we  will  prove  in  the  next  section  is  that  as  the 
arrival  and  departure  of  actual  flow  to  and  from  the  network  becomes 
smoother  in  a  sense  to  be  defined  shortly,  and  At  becomes  smaller,  the 
resulting  value  of  expression  (2.2.7)  approaches  the  limit  infimum  that  can 
be  achieved  by  any  control  TT(*)  for  which  THtleX  is  a  random  variable  for 
all  t  >  0. 

Formally,  we  let 

^w  =  ^"w*n  >  Y  =  Yw /n  (2.3.4) 

for  certain  positive  constant  7  and  and  we  state  the  following  propos- 
ition  below  in  terms  of  n.  We  require  n  large  so  that  individual  session 
arrivals  do  not  imbalance  the  routing,  and  At  to  be  small  so  that  large 
number  of  session  arrivals  routed  on  the  same  path  do  not  imbalance  routing. 
Propositon  (2. A)  Under  Assumption  (2. A)  as  At+0  and  n+»  the  cost  of 
(2.2.7)  resulting  from  the  control  defined  by  (2.3.1)  approachs  the  minimal 
possible. 

2.4  Convergence  Proof 

We  first  find  a  lower  bound  to  (2.2.7)  and  then  show  that  the  cost 
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resulting  from  the  control  U(  • )  as  nt»  and  At 4-0  approaches  this  bound. 

To  bound  (2.2.7)  for  any  control,  we  observe  that  by  Jensen's  inequal¬ 
ity  and  Fubini's  Theorem 

lim  inf  E  L  /TD(f(t))dt  >  lim  inf  D  d  /TE(f(t) )dt) .  (2.4.1) 

T>»  T  o  X-*-00  I  0 

On  the  other  hand,  from  (2.2.4)  using  the  facts  that 
xP(0) 

lim  l  J  -ef(t)  =  0  (a. s)  T  weW  (2.4.2) 

t+°°  peP  i  =  l  1 
w 

and 

t  w  -1 

lim  E/  es(t-s)d(Aw(s) )  =  xw-p  Y  weW  (2.4.3) 

t-K«  0 

where  (2.4.3)  follows  from  Little's  formula  [40 j ,  we  get  that  for  each 

initial  condition  xp(0),  peP  ,  weW,  and  e>0  there  exist  T  such  that  for  all 

w 

t>T 


-r  / [ Exp ( t )  ]dt  >  0 
z  0 


T  weW,  Y  t>t  (2.4.4) 

Y  pePw,  weW,  Y  t>T.  (2.4.5) 


Consider  the  problem  in  Rlpl  where  IPI  =  l  IP  I  of 

uieW  w 


minimize  D(TTy) 

(2.4.6) 

subject  to 

l  yp  =  Vy_1,T 

pePw 

T  weW 

(2.4.7) 

yP  >  0 

T  pePw,  weW. 

(2.4.8) 

Denote  its  solution  by  D(lTy). 

Then  by  Assumption  (2. A)  it  is  not 

hard  to 

see  that  the  solution  v(e)  to  the  problem 
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min  D(Ey) 
subject  to 

I  ^*Y  -  e  T  weW 

peP  w 

w 

p 

y >'  0  V  pePw,  weW 


is  "stable"  [43]  for  e=0  in  the  sense  that  the  solution  to  the  problem  is 
continuous  with  respect  to  perturbations  in  the  problem  parameters,  and 
that  the  equality  in  (2.4.7)  can  be  replaced  by  an  inequality.  As  a  result 
we  deduce  that 


lim  v(e)  =  D(Ey) . 
e+0 


(2.4.9) 


Thus  taking 

i  t 

{  /E(rxp(t))dt 
0 

in  (2.4.4),  (2.4.5)  to  play  the  role  of  y,  we  conclude 

i  T 

lim  inf  E  |  ^D(f(t))dt  >  D(T) 


(2.4.10) 


where 


T  =  Ty  . 


Notice  that  since  D(*)  is  strictly  convex  T  is  unique. 

We  now  show  that  the  cost  resulting  by  using  the  control  U(  •) 
approaches  D(7)  as  n+»  and  At+0. 

To  this  end  we  investigate  the  behavior  of  the  imbedded  Markov  process 
{ x( k  At) } . 


We  take  a  convex  Lyapunov  function  V(x),  such  that  Q(a)  =  V(a(x1-x2)) 
is  strictly  convex  provided  that 
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*(xfx2)  t  0 

and  for  which 

V*  =  min  V(x)  =  D (7)  (2.4.11) 

x>0 

where  for  each  x  which  attains  the  minimum  we  have 

7  =  Txy. 

We  then  show  that  for  Tx(kAt)Y  outside  a  neighborhood,  shrinking  to  a 
single  point  as  ntO  and  At+0,  of  7  we  have 
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Lei  n  ,  weW  be  the  Lagrange  multipliers  to  equations  (2.4.7)  in  problem 
w 

(2.4.6).  It  is  well  known  [41]  that  the  function  V ( x )  defined  by 


V(x)  =  D(Exy)  +  l  [n  (  l  yxP  -  X  y“^y)  +  n(  l  y*P  -  X  y-ly)2] 

weW  w  peP  w  peP  w 

w  w  (2.4.14) 

p 

satisfies  (2.4.11).  Since  in  what  follows  x  >0  T  pePw  weW,  we  consider 
only  neighborhoods  relative  to  the  positive  orthant.  With  respect  to  V(x) 
we  have  the  following  lemmas,  whose  proofs  are  presented  in  Appendix  (2. A). 
Lemma  (2. A):  For  x(kAt)  such  that  xP(kAt)  >  0  for  all  peP  ,  weW  we  have 

-  r  w 

E  Cv'V(x(kAt) )-  (x [ (k+1) At J  -  x(kAt))  /<(kAt)] 


where  x  solves 


(1  -  e  wAt)  W(x(kAt) ) '  •  (x  -  x(k At) ) 


min  7V(x(kAt) )'  (x  -  x(kAt)) 


subject  to 


l  XP=  Xwy 

pePw 

XP  >  0 


Y  weW 


X  >  0  Y  pePw,  weW  . 

Using  the  convexity  of  Y(x)  we  can  immediately  deduce  the  corollary 
that  follows  by  the  subdifferential  relation: 

V(y)  >  V(x)  +  W( x) ' (y  -  x)  Y  y,x. 

Corol lary  to  Lemma  (2. A): 


E  [  vV(x(kAt) ) 1  •  ( x[  (k+1 )  At]  -  x(kAt))  /x(kAt)]  < 


Lemma  (2.B) 


(1  -  e ’MAt) • [V*  -  Y( x(kAt) ) ] 


E  [  nx[  (k+1)  At]  -  x(kAt)  ir/x(kAt)  ]  < 
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i  l  |[xP(k4t)(l  -  + 

WeW  peP 

w 

xP(kAt)e“pAt(l  -  e“pAt)}  + 


i  v 


_1  1  -  e“uAt 


2  -uAt  2 
+  (Xwu  )  (1  -  e  )  + 


\  p_1(1  -  e‘^dt)  1  -  1  e~ 
W  pAi 


Now,  by  Assumption  (2. A)  and  the  definition  (2.4.12)  it  is  not  difficult 
to  see  that  W(x)  is  Upschitz  continuous  with  module  where 


iiTTii  =  max  { flTy «  1  iiyii  <  1} 


(notice  that  H Eh >1}  and  therefore  by  the  Taylor  Expansion 


V[x( (k+1) At) J  -  V[ x( k At)  ]  y/x(kAt) 


E  v' V( x(k At) )*(x( (k+1) At)  -  x(kAt) )  / x(kAt)  + 


—  2  2 

7  nEii  y  »E  u x[ (k+1) At] 


-  x(kAt)  !!^/x(k  At) 


Let  M  be  given  by 


M  =  min  l  {nw  (  I  yy 
y  WeW  pePw 


-1  _  p  -12 

xw*p  y)  +  n  (  )  yy  -  xwp  y)  } 

pePw 


then  by  Assumption  (2. A)  and  (2.4.14) 


l  l  yxP(kAt)  <  ~  V(  x( k At) )  -  M 
WeW  peP^  n 


T  k>0  (2.4.15) 
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T  k>0.  (2.4.16) 


l  l  Y2*xP{kAt)2  <  ^  V(x(kAt)  )  -  M 
weW  pePw  — 


Using  the  Corollary  to  Lemma  (2. A),  Lemma  (2.B)  and  (2.4. 15) -( 2.4.16) 
in  the  expansion  above  we  get 

E  {V  [x( [k+1 ] At) ]  -  V(x(kAt) )/x(kAt) }  (2.4.17) 


<  (i  .  e"yAt)  [V*  -  VCx(kAt) )  J  + 


-  u  At  2  -  u  At  -  y  At 

£  nr#2  ILzJ. _ L  _ _ h _ ix  ( v ( x ( k At) )  -  m 

2  n 


"^r 


+  i 

weW 


,,  -1,  (1  -  e“pAt)  2  ,  -142..  -yAt,2 

(Vy  J  “ TZt - +  (AwYU  Ml  -  e  ) 


+  (x  yu“M(1  -  e'^^tjd  -  1  -  e  ) » y 
w  niAt 


-  (1  -e‘yAt)  i  V*  -  V(x(kAt) ) • 


.  n  ^  2.1  -  e"yAt 

1  -  7  «r«  ( — - — 


Y • p“ U At 
+  — — ) 


•?  IFI2 


-yAt 

1  -  e  Y.e-uAt 


+  l 

weW 


UWYM  1)*Y  +  (XWYP  1)2(1  -  e"UAt) 


-yAt 


+  (X  YU-1)  ( 1  -  - ).Y 


w 


yAt 


<  (1  -  e‘uAt) 


V  -  V(x(kAt))(l  -  C]_(n,At))  +  c2(n,At) 


where,  using  x  y  =  *  *Y  for  all  n, 
w  w 
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11m  c  (n,At)  =  lim  c  (n,At)  =  0. 

At+0  1  at+Q  2 

n+«  n+» 

This  follows  because 

lim  y  =  0,  lim  (1  -  e_)jAt)  =  0. 

n+»  At+0 

Consider  the  set 

F(e)  =  {f  I  f=Exy  V(x)< V  +  e  xP>0  T  pePw  weW}. 

By  the  construction  of  V  we  have  that  for  any  5>Q  there  exist  7  such  that 
for  all  e<7 


{f  I  IT  -  f  ll<5}3  F(e). 


Mow,  if  we  take  e<e,  n  large  enough  and  At  small  enough  such  that 

V*  +  c2(n, At) 

I  -  ci (n, At) 


2  "'■■■ 


we  conclude  that  for  all  x(kAt)  such  that  f/t  F(e),  f  =  7x(kAt)y  we  have 

E  [V(x([k+l]At)  -  V( x(k At) )/x(k At)  ]  <  -(1  -  e_tjdt)*  (2.4.18) 

Also,  relation  (2.4.17)  implies  that 

lim  E  [V(x([k  +  1 ] At) )  -  V( x(k At) )/x(k At) ]  =  0  (2.4.19) 

At+0 
nf» 

uniformly  in  x(kAt)  in  a  bounded  set.  Relations  (2.4.18)  and  (2.4.19) 
prove  the  propositon.  Q.E.O 
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3.  Voice  Flow-Control 


3.1  Introduction 

Recently  a  group  at  Lincoln  Laboratory  [31]  has  introduced  an  interest¬ 
ing  Voice  Coder  (Vocoder)  scheme.  The  proposed  Vocoder  uses  a  digitiza¬ 
tion  method,  dubbed  "embedded  coding",  which  was  first  proposed  by  the 
Naval  Research  Laboratory  [47].  Essentially,  a  segment  of  talkspurt  is 
coded  into  packets  of  different  "priority"  levels.  The  higher  "priority 
packets  contain  the  "core"  of  the  speech  while  the  lower  priority  packets 
contain  the  information  that  "fine  tunes"  it.  This  coding  schemes  allows 
for  the  implementation  of  a  sophisticated  flow  control  mechanism. 

While  traditional  voice  flow  control  mechanisms  use  blocking  either  by 
preventing  the  initiation  of  a  call  or  by  discarding  small  segments  of  it 
when  the  call  is  already  in  progress,  the  "embedded  coding"  scheme  allows 
for  the  alleviation  or  prevention  of  congestion  by  dynamically  trading  off 
between  voice  quality  and  congestion,  by  discarding  the  lower  "priority" 
packets  either  at  the  point  of  congestion  or  the  point  of  entry.  Thus,  in 
contrast  with  the  traditional  schemes  which  convey  only  part  of  the 
message,  each  part  at  high  quality,  the  "embedded  coding"  scheme  preserves 
the  continuity  of  the  speech  with  some  overall  degradation.  The  level  of 
congestion  at  which  the  gaps  between  the  segments,  delivered  by  the  tradi¬ 
tional  schemes,  render  the  speech  unintelligible  is  much  lower  than  the  one 
at  which  the  "embedded  coding"  scheme  delivers  unintelligible  information. 
This  flexibility  in  exercising  flow  control  makes  the  embedded  coding 
scheme  attractive. 

Alleviation  and  prevention  of  congestion  by  discarding  lower  priority 
packets  at  the  point  of  entry  seems  to  be  superior  to  discarding  them  at 
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the  point  of  congestion.  The  later  amounts  to  a  waste  of  network  re¬ 
sources.  But,  it  would  not  be  advisable  to  forgo  the  capability  of  dis¬ 
carding  lower  priority  packets  at  the  point  of  congestion,  because  of  the 
time  delay  involved  in  making  the  entry  points  aware  of  the  congestion 
build-up  situation.  As  a  result,  we  advocate  the  use  of  the  two  capabili¬ 
ties  in  complementary  roles.  The  rates  at  the  entry  points  will  be  deter¬ 
mined  upon  longer  time  averages  of  congestion  levels  while  the  capability 
of  discarding  packets  at  the  point  of  congestion  will  serve  to  alleviate 
intolerable  momentary  congestion.  The  rates  at  the  entry  point  will  be 
adjusted  so  that  the  capability  of  discarding  packets  at  the  point  of 
congestion  will  not  be  exercised  too  often.  This  is  analogous  to  the 
complementary  roles  of  quasi -static  routing  and  window  flow-control  of 
data. 

In  this  chapter  we  discuss  a  method  of  determining  the  input  rates  at 
the  entry  point.  To  this  end,  we  will  ignore  the  capability  of  discarding 
packets  within  the  network  in  order  to  avoid  the  "interaction"  issues. 

These  issues  will  be  taken  up  in  the  final  chapter. 

As  in  quasi-static  routing,  we  are  looking  for  an  algorithm  that  will 
adapt  the  input  rate  to  the  changing  flows  in  the  network  resulting  from 
the  initiation  and  termination  of  sessions.  As  in  quasi-static  routing,  we 
employ  an  "on  line"  iterative  algorithm  that  will  solve  a  static  problem. 
The  hope  is  that  the  algorithm  converges  fast  enough  relative  to  the 
sessions  initiation  and  termination  process,  and  as  a  result  will  be  able 
to  "track"  its  variation,  keeping  the  rates  in  the  ballpark  of  the  optimal 
rates  at  all  times. 

The  criterion  used  to  determine  input  rates  is  one  of  the  central 
issues  in  this  chapter.  In  Section  3.2  we  introduce  the  notion  of  "fair 
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allocation".  In  Section  3.3  we  present  two  previous  algorithms  after  which 
our  algorithm  is  fashioned.  In  the  following  section  we  present  and 
motivate  the  exact  mathematical  problem  we  intend  to  solve.  Next,  we 
introduce  the  algorithm,  prove  its  convergence,  and  finally  discuss  issues 
arising  from  the  need  to  implement  the  algorithm  in  a  distributed  manner. 

Keeping  in  the  back  of  our  mind  the  compatibility  issues  among  the 
algorithms  presented  in  this  Thesis,  we  discuss  the  input  rate  allocation 
problem  in  the  context  whereby  routing  for  each  session  is  done  on  a 
virtual  circuit,  fixed  for  the  duration  of  the  session.  A  heuristic 
virtual  circuit  allocation  rule  was  discussed  in  the  previous  chapter. 

3.2  Fair  Allocation 

In  this  section  we  discuss  the  criterion  for  determining  input  rates. 
Although  the  total  of  the  input  rates  that  should  be  allocated  might  be 
debatable,  being  a  tradeoff  between  throughput  and  delay,  it  is  unde- 
batable  that  the  individual  session  rate  within  the  generally  limited  total 
rate,  should  be  allocated  in  a  fair  manner. 

It  is  customary  to  consider,  as  one  of  the  characteristics  of  a  fair 
allocation,  the  feature  that  it  is  indifferent  to  the  geographical  separa¬ 
tion  of  the  session's  origin  and  destination.  Although  there  might  be  dif¬ 
ferent  priorities  assigned  to  sessions,  these  priorities  are  not  assigned 
on  the  basis  of  geographical  separation.  Moreover,  two  sessions  of  the 
same  priority  should  obtain  the  same  rate,  if  the  rate  of  one  can  be  traded 
for  the  rate  of  the  other.  This  is  in  the  spirit  of  making  the  network 
"transparent"  to  the  user.  The  user  should  have  no  idea  of  the  length  of 
the  path  assigned  to  him  through  the  rate  allocated  to  him. 

To  capture  the  notions  of  fairness  and  priority  as  presented  above,  we 
define  the  notion  of  "fair  allocation": 
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Let  Q  be  a  totally  ordered  set  (i.e.  for  all  aeQ,  beQ,  a^b  we  have  a<b 

or  a>b)  and  let  X  be  a  given  subset  of  Qn.  A  vector  b  =  (b* . bn)'  is  said 

to  be  lexicographically  less  than  or  equal  to  the  vector  d  =(d' . d‘:)'  if 

b1  >  d1  implies  the  existence  of  j  <  i  such  that  bJ  <  cf3 .  The  vector  x 

=  (xi,...xn)‘  e  X  is  called  a  fair  allocation  of  x  over  X  if  for  each  yeX 

there  exists  a  permutation  x  of  x  which  is  lexicographically  greater  than 
or  equal  to  all  permutations  y  of  y. 

If  we  consider  the  set  X  as  a  "feasible"  set,  a  fair  allocation  vector 
x  over  X  solves  an  hierarchy  of  nested  problems.  The  first  one  maximizes 
the  minimal  entry  of  vectors  in  X.  The  second  maximizes  the  second  mini¬ 
mal  entry  of  all  vectors  which  solve  the  first  problem,  etc. 

The  usual  difficulty  with  such  a  problem  is  that  in  order  to  solve  the 
subproblem  in  the  hierarchy,  the  solutions  to  the  preceding  subproblems 
have  to  be  available.  This  was  the  case  in  [26]  where  a  fair  allocation 
was  sought  by  solving  a  nested  sequence  of  linear  programs.  It  turns  out 
in  our  case,  that  an  iterative  algorithm  has  an  advantage  in  the  sense 
that  all  the  subproblems  in  the  hierarchy  can  be  solved  simultaneously. 

This  can  be  explained  by  the  continuity  of  the  solution  of  the  j— 
subproblem  in  all  the  preceding  solutions.  A  more  detailed  explanation 
will  be  given  within  the  convergence  proof  to  an  algorithm  we  will  present 
in  Section  3.4. 

3.3  The  Problem  and  Previous  Work 

Let  N  denote  a  network  with  nodes  1,  2 . N  and  let  t  be  a  set  of 

directed  links  connecting  the  nodes.  With  each  link  act  we  associate  a 
number  c  ,  called  the  capacity  of  link  a.  Let  S  denote  a  set  of  sessions 

a 

taking  place  between  nodes.  Each  session  seS  has  an  origin  node,  a 


destination  node  and  a  simple  path  p^  leading  from  the  origin  node  to  the 
destination  node.  Define 


1  (a) 
Ps 


r 

1  if  a  belongs  to  p 

s 

0  otherwise. 


(3.3.1) 


The  kind  of  problem  we  deal  with  in  this  chapter  is  to  allocate  to  each 
session  seS  a  rate  ^  such  that  the  allocation  satisfies  a  given  criterion. 

Hayden  [33]  proposed  a  quasi-static  distributed  algorithm  which  results 
in  a  vector  7  =  { . . .  ,7s. • • •) 1  which  is  a  fai~  allocation  over  the  set 
of  all  vectors  ( . . . ,ys, .. .) '  such  that 

E  y  *ln(a)  <  p*ca  T  seS,  7  ae£  (3.3.2) 

seS  s 

and  where  o<p<l  is  a  certain  constant,  usually  taken  to  be  0.8.  Jaffe  [32] 
proposed  a  distributed  nonquasi -static  algorithm  resulting  in  a  vector  7 
such  that  the  vector  ( . . . ,7s/ 8s* •••) 1  is  a  fair  allocation  over  the  set 
of  all  vectors  ( ,, . . ,ys/Ss, • • •) 1  such  that 

yS/gS  • 1D( a )  <  Ca  -  l  D ( a )  T  seS,  7  ae£  (3.3.3) 

^s  teS  Ht 


and  where  es  is  some  constant  associated  with  session  seS. 

The  rationale  behind  (3.3.2)  is  quite  simple:  we  do  not  allow  the 
total  flow  on  each  link  to  occupy  more  than  some  fraction  of  the  total 
capacity.  The  rationale  behind  (3.3.3)  is  more  sophisticated.  Primarily, 
it  allows  us  to  accommodate  fluctuations  of  a  session  rate  which  is  a  func¬ 
tion  of  the  rate,  and  in  addition,  it  enables  us  to  establish  preferences 
among  sessions. 
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While  Jaffe's  algorithm  is  not  iterative  and  not  suitable  for  quasi  - 
static  operation,  Hayden’s  may  result  in  transient  flows  that  are  much 
much  larger  than  the  capacity  available  to  accommodate  them.  Since  the 
last  point  may  not  be  obvious  at  first  glance  we  show  it  by  an  example.  We 
first  state  his  algorithm. 

Let  nfl  be  the  number  of  sessions  traversing  link  a e4.  Then  is 
determined  by  the  iterative  algorithm 

(3.3.4) 

a  teS  rt 


X  =  min  Fr 
*  asl  (a)=  1  a 
*s 


(3.3.5) 


where  IT  for  all  aefc  is  an  arbitrary  number. 

a 

Consider  the  Figure  (3.1): 


There  are  7  sessions  in  the  network.  Five  of  them  originate  at  nodes  1  to 
5  and  terminate  at  node  12.  One  originates  at  node  1,  traverses  the 
zig-zag  path,  and  terminates  at  node  6,  and  the  last  one  originates  at  node 
11  and  terminates  at  node  12, 


Let  all  capacities  be  c  aside  from  link  11-12  with  capacity  3.5c.  A 
fair  allocation  y  =  ( . . .  ."y  , . . . ) '  over  the  set  defined  by  (3.3.2),  withp=l 
is  c/2  to  all  sessions,  aside  from  session  (11-12)  which  gets  a  rate  of  c. 
The  steady  state  R's  are 


R(ll, 10)  =  R(2, 9)  =  R(3,8)  "  R(4,7)  =  R(5.6)  =  c/2 


all  other  R's  equal  c.  Immediately  following  the  withdrawal  of  session 
(1-6)  all  R's  will  be  c,  according  to  (3.3.4).  By  (3.3.5)  this  will  result 
in  a  total  rate  of  6c  over  link  (11,12)  which  can  accomodate  only  half  of  it. 
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No  reasonable  correction  factor  such  as  p  in  (3.3.2)  can  help  here. 

The  problem  stems  from  determining  only  one  number  for  each  link.  When  a 
session  changes  its  rate,  it  changes  it  with  no  regard  to  the  particular 
rate  it  had  before. 

We  generalize  the  set  defined  by  (3.3.3)  in  the  following  way: 

Let  g  :R+  +■  R+  be  a  function  associated  with  link  ae£.  Let  f  :R+  +  R+ 
a  s 

be  a  function  associated  with  session  seS  and  which  possesses  an  inverse 
f"1  .  We  are  interested  in  a  quasi -static  algorithm  which  will  result  in 
a  vector  7  such  that  the  vector  (....f"1^),...) 1  is  a  fair  allocation 
over  the  set  of  all  vectors  ( . . . , f “ ^ ( ys ),...)'  such  that 

U  )’lp(a)  <  9a (ca  ■  l  y  • lp ( a ) ) • lp ( a )  (3.3.6) 

S  teS  t  S 

T  seS,  Y  ae£, 

l  yM  (a)  <  c,  Y  ae£,  (3.3.7) 

teS  Ht 

YS  >  0  T  seS.  (3.3.8) 

To  see  ♦he  role  of  g  ,  assume  that  f  is  the  identity  function.  De- 

a  s 

pending  on  the  length  of  time  over  which  the  rate  of  a  session  is  measured, 
we  can  have  two  interpretations  of  the  role  of  our  algorithm.  Both  inter¬ 
pretations  suggest  the  same  type  of  form  for  the  function  g  . 

a 

In  the  first  interpretation,  the  length  of  time  over  which  the  rate  is 
averaged  is  relatively  short  with  respect  to  the  "time  constant"  of  the 
counting  process  of  the  number  of  off-hook  speakers  in  talkspurt  mode.  In 
this  case  we  just  deal  with  and  worry  about  accommodating  the  speakers  which 
are  currently  in  the  talkspurt  mode.  Since  about  30%  of  a  talkspurt  is 
silence  and  some  segments  of  the  talkspurt  need  more  encoding  than  others, 
we  view  the  bit  rate  generated  by  the  Vocoder  for  session  seS  as  a 
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stochastic  process  with  mean  ys  -  the  rate  assigned  to  user  seS.  This 
amounts  to  the  assumption  that  the  Vocoder  has  the  means  of  dynamically 
reconfiguring  to  the  demands  of  the  voice  to  achieve  the  desired  average 
rate. 

Suppose  that  we  want  to  reserve  excess  capacity  on  each  link  so  as  to 
be  able  to  accommodate  at  least  a  variation  as  large  as  the  standard 
deviation  of  the  flow  on  the  link.  Assume  that  the  standard  deviation  of 
the  rate  of  a  session  seS  which  was  allocated  an  average  rate  ys  Is 
e*ys  for  some  0< 0<1 .  Let  s'eS  be  such  that 

s'  =  arg  max  i  Yt-l„(a)  1 ,  (3.3.9) 

teS  (  pt 

then,  by  the  independence  of  rates  of  different  sessions  we  get 
(abusing  notations) 

^  (  l  yM  (a))  <  (c  /yS’)172  a(  yS' )  (3.3.10) 

\  teS  pt  /  3 

<  (ca/ys')1/2s*yS  <  (ca)  1/20( YS  ) 1/2 

<  8-(c  )V2.  g^(ca  -  l  yt-lp(a)) 

teS  t 

where  the  last  inequality  billows  from  (3.3.6)  with  f  =1.  Thus  if  we  take 

s 

1  2 

9aM  =  ~2 —  (•)  T  ae£  (3.3.11) 

e  ca 

we  are  guaranteed  to  accommodate  the  standard  deviation  of  the  flow 
resulting  from  the  fair  allocation. 

In  the  second  interpretation,  the  length  of  time  over  which  the  rate  is 
averaged  is  relatively  long  with  respect  to  the  "time  constant"  of  the 
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counting  process  of  the  number  of  off-hook  speakers  in  talkspurt  mode.  In 
this  case  we  deal  concurrently  with  all  the  off-hook  sessions  and  want  to 
be  able  to  accommodate  the  standard  deviation  around  the  mean  of  the  process 
(i.e.  the  instantaneous  effect  of  the  number  of  speakers  at  the  talkspurt 
mode  is  washed  out  by  the  long  time  average).  Let  q  be  the  fraction  of 
time  a  speaker  is  in  the  talkspurt  mode  and  assume  his  rate  while  in  the 
talkspurt  mode  is  constant.  Then  using  notations  as  before 

o(  I  yt *  1  _ ( a ) )  ^  Z  [(  Yt/q)2*q(l-q)*ln(a)]1/2  (3.3.12) 

teS  pt  teS  pt 


(ca// 


a/2  s’ 

)  -y 


[q(l-q)]V2 

q 


and  we  get  a  similar  result  as  before  for  g  (  ). 

The  point  we  want  to  make  by  the  above  arguments  is  the  need  to  allow 
g,  to  be  a  nonlinear  function,  which  may  depend  on  c  ,  rather  than  only  on 
the  excess  capacity  as  (3.3.3)  implies.  The  exact  role  of  ga  is  up  to  the 
network  manager  to  decide,  and  our  formulation  allows  him  a  great  deal  of 
flexibility  in  this  regard. 

To  understand  the  role  of  fg,  notice  that  if  f~j  (yS1)  and  f~*  (yS2) 
are  entries  of  the  vector  of  fair  allocation  and  ySl  can  be  traded  for 
yS2  then  the  ratio  of  yS2/yS^  will  be  governed  by  f~|  and  f“*.  Taking 
these  functions  to  be  nonlinear  allows  changing  the  ratio  of  the  rates 
as  a  function  of  the  actual  rates  -y1  and  y?-.  Without  advocating  a  par¬ 
ticular  preference  whether  to  get  the  ratio  toward  unity  as  one  session 
approaches  the  threshold  of  intelligibility,  or  not,  having  the  possibility 
of  using  a  nonlinear  function  in  this  regard  too,  may  be  convenient  in 
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various  appl i cations. 

As  will  be  explained  in  the  final  section,  in  order  to  carry  out  the 
algorithm  of  the  next  section  we  have  to  store  in  link  ae£,  the  functions 
g  ,  and  f  for  all  s  traversing  a.  This  is  not  too  difficult  if  there  are 

a  S 

few  "priority"  classes  of  sessions  and  correspondingly  few  possibilities 
for  f  .  All  that  a  link  has  to  know  in  this  case  is  merely  the  class 
number  of  each  session  traversing  it. 

3.4  The  Algorithm 

We  will  state  and  prove  the  convergence  of  the  algorithm  in  a  centra¬ 
lized  context.  Distributed  implementation  will  be  discussed  in  the  final 
section. 

Assume  that  y,s  is  given  for  all  seS  and  that 
k 


0<Tk, 


l  Tk-lp(a)  <  ca 
teS  t 


T  ae£,  Y  seS 


(3.4.1) 


then  y^+1  is  determined  by 


Yks+i  =  m,'n  rk  *  (fs9a[c  -  l  if-1  U>]  -  i) 
a : 1 p ( a ) =1  v  teS  1 


Y  seS  (3.4.2) 

where  aa  is  to  be  specified  later  and  f  g  (•)  denotes  f  (g  (•)). 
k  s  a  s  a 

We  make  the  following  assumptions  concerning  g  and  f  : 

3  S 

Assumption  (3. A):  g^*)  anc*  f  (•)  ^or  a^  ae*  anc*  se$  are  monotonical ly 

non-decreasing. 

Assumption  (3.B);  f  g  (•)  is  convex  (concave  is  possible  too,  but  will  not 

S  3 

be  pursued)  differentiable  with 

f  g  (0)  =  0 
s  a 

0  <  f  g  (c  )  ^  m  <® 
s  a  a  sa 

for  all  seS  and  ae£. 
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(3.4.3) 


< 


I 

I 


There  are  two  options  for  choosing  ^ , 
The  first 


a 


1 


1  + 


zCmta  -  ft9a(ca  *  *  Vn(a))]‘l  (a) 
teS  ta  t  a  a  u£s  k  ?u  pt 


I  YkUa) 
teS  K  pt 


'tack,  k=l,2,... 


and  the  second 


i 


1  +  z  (ft9a),[ca  -  Z  JJln( a)3- 1_ (a) 

ta  a  U£$  k  pu  p+ 


teS 


The  step  size  (3.4.3)  is  not  defined  for 


I  Vn  =  °* 
teS  K  pt 


(3.4.4) 


In  such  a  case  we  use  the  step  size  (3.4.4).  It  will  be  evident  later  that 
this  amounts  to  defining  (3.4.3)  by  taking  limits. 

For  the  two  options  of  step  sizes  (3.4.3),  (3.4.4)  we  have  the 
following  lemma  whose  proof  is  relegated  to  Appendix  (3. A). 

Lemma  (3. A):  Let  y^  satisfy  (3.4.1)  and  let  Yk+^  be  determined  from 
Yk  by  (3.4.2)  with  aa  as  in  (3.4.3)  or  (3.4.4).  Then  under  Assumptions 
(3. A)  -  (3.B) 


0  <  Y 


k+r 


V  t  , 

z  Yk+l#1 

teS  K  1 


<  c. 


rt  S,  tack  (3.4.5) 


and 

•1(a)  (3.4.6) 

pt 

t  act. 

The  continuity  of  the  R.H.S.  of  (3.4.6)  in  c  implies  the  following 

a 


lim  sup  Z  l  1  (a)  <  Z  ftg 


teS 


t£S 


c  -  lim  sup  Z  Y^*ln(a) 
a  k-^0  u£S  K  pu 


45 


corollary. 


Corollary  to  Lemma  (3. A):  Under  the  conditions  of  Lemma  (3. A)  if  c  is  re- 

.  —  —  —  -  -  -  "  J  d 

placed  by  a  sequence  { ( c  )  }  such  that  (3.4.5)  is  satisfied  at  each  itera- 

d  K 

tion  and 


then  (3.4.6)  holds  with  c  replaced  by  c  . 

a  a 

The  idea  behind  the  choice  of  the  expressions  (3.4.3)  and  (3.4.4)  as 
well  as  the  simple  intuition  behind  Lemma  (3. A)  can  be  best  explained  by 
the  use  of  Figures  (3.2)  and  (3.3).  Let  F*  denote 


SeS 


and  let  the  function  G  (*):R+>R+  denote 

d 

l  f  g  (c  -M)l  (a). 
seS  s  a  a  Ps 


The  two  figures  depict  the  relations  between  F^  and  F^+1,  when  the 
network  consisted  of  the  single  link  a.  When  this  is  not  the  case,  (3.4.2) 
implies  that  we  have  at  most  over  estimated  F®+^.  In  Fig.  (3.2),  which 
corresponds  to  as  in  (3.4.3),  F*+^  is  determined  by  intersecting  the 
line  connecting  the  point  (o,Ga(o))  with  the  point  (F^,Ga(F^)),  with  the 

d  d  d 

line  y  =  F  .  In  Fig.  (3.3),  which  corresponds  to  as  in  (3.4.4),  F^+i  s 
determined  by  intersecting  the  tangent  to  the  graph  of  y  =  G  (F?)  at  the 

d  K 

a  d  d 

point  ( Fj,. ,  G( Fj,. ) ) ,  with  the  line  y-F  .  The  reader  can  easily  convince  him¬ 
self  the  lim  sup  F^  must,  in  both  cases,  lie  in  the  area  where 
k+« 

Fa  <  G  (Fa) 
a 

which  gives  rise  to  the  lemma. 
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As  for  the  possible  functions  for  f  and  g  ,  Assumption  (3.B)  is  not 

S  3 

too  restrictive  since  it  will  be  satisfied  for  instance  when  both  f  and  g 

s  a 

are  convex  increasing  on  (o,  »).  Figure  (3.4)  shows  why  just  monotonicity 

of  6  (  )  is  not  sufficient  for  the  lemma  to  hold. 
ax 

We  can  now  state  the  main  result  of  this  chapter. 

Proposition  (3. A);  Under  Assumptions  (3.A)-(3.B),  with  aj^  as  in  (3.4.3)  or 

(3.4.4),  the  sequence  {y|<}  >  generated  by  (3.4.2)  with  y0  satisfying 
(3.4.1),  converges  to  a  vector  Moreover  the  vector  (. . . ,f“*(YS) , . . . ) 1 
is  a  fair  allocation  over  the  set  of  all  vectors  (. . .  ,f"*( ys) . • • •) 1  where 
YS  for  all  seS  satisfy  (3.3.6)  -  (3.3.8). 

3.5  Convergence  Proof: 

Let  be  a  subsequence  such  that 

lim  y!, ,  =  lim  inf  y!  .  (3.5.1) 

keK  k+1  k 

k>« 


By  (3.4.2)  we  can  now  write 


lim  =1 im  mi n 
keK  k+1  keK  a:l  (a)=l 
k+«s  k-+®s  Ps 


“k[fs9a(Ca  '  Fk>  -  \S1 


Since  the  path  pg  consists  of  a  finite  number  of  links,  then  w.l.o.g. 

possibly  by  taking  a  subsequence. there  exists  a  link  ae£,  1  00  =  1  such  that 

’  Ps 


1  im 
keK 
k-n»s 


1  im 
keK 
k-*-°°s 


aj(f  gjc  -  FJ 
k  s  <T  a  k 


-  y; 


:)J. 


(3.5.2) 


By  relation  (3.4.3),  Assumptions  (3.A)-(3.B)  and  (3.4.3)-(3.4.4)  it  can 
be  easily  verified  that  there  exist  _e>0  and  "b>0  such  that 

a  — 

0<3_<a|<<B<l  7  ae£  k=0, 1 .... . 
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and  therefore  w.l.o.g.,  possibly  by  taking  a  subsequence  of  we  have 

1  i  m  aa  =  ti3  >  0 .  (3.5.3) 

keK  k 

k+®s 

Thus,  we  get 

lim  y?  ,  >  Tim  1 nf ( 1  -aa)Ys  +  lim  Inf  aa*f  g  (c_  -  Ff)  (3.5.4) 
keK  k+1  keK  k  k  keK  k  a  k 

k-M>°s  k-*-°°s  k-*-»s 

>  (1  -  13^)1  im  Inf  y.s  +  t**  1  im  inf  f  g-(c  -  Fa). 
l(+»  k  k-*-*  s  a  1  k 

Using  (3.5.1)  and  Assumption  (3.B)  we  obtain  from  (3.5.4) 

iim  inf  ys  )  f  g  (c  -  lim  sup  Fa).  (3.5.5) 

k-voo  *  S  T  T  k+00  k 

Since  the  derivation  of (3.5.5)  was  independent  of  s  we  can  conclude  that 
for  all  teS  there  exist  a(t)  such  that  (3.5.5)  holds  with  t  substituting  s, 
and  a(t)  substituting  "a. 

Let  a^e£  satisfy 

a.  =  arg  min  g  (c  -  11m  sup  Fa).  (3.5.6) 

1  ae£  a  a  k-*-»  * 

Using  the  monotonicity  of  f  (Assumption  (3. A))  we  have 
ft9a(t)[ca(t)  -  IJ^sup  F*(t)]  >  ft9ai[cai  -  l™  sup  F»l]  r  US,  (3.5.6) 
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t 


< 


Tim  inf  Ffl  >  £  1  im  inf  vM  (a, )  > 
k+®  k  teS  k—  k  pj  1 

*  I  ft9a  tca  “  1lm  sup  F^1 ] -1  (a.l, 
teS  t  al  al  k+®  k  pt  1 


(3.5.9) 


and  therefore  by  Lemma  (3. A) 


Tim  inf  F?1  >  nm  sup  p?l 


k+® 


k+® 


implying  that 


1 im  Fj*l 
k+®  k 


exists.  Moreover  we  now  have 

1  lim  inf  yj*l  (a  )  >  1 im  )  yt*l  (a  ) 
lf-«  k  Pt  1  k-*-®  teS  k  Pt  1 


teS  k+« 

which  implies  the  existence  of 

t. 


lim 

k+« 


Yk^p^al^ 


T  teS, 


(3.5.10) 


(3.5.11) 


and  therefore 


lim  yjl  Ui)  =  ftg  fCa  -  l  lira  y!J.lp(ai)].lp(ai)  (3.5.12) 

1  1  ucS  k-*-®  u  t 


T  teS. 


Thus  we  have  proved  the  convergence  of  the  rates  of  all  sessions 
traversing  link  a  ,  as  defined  in  (3.5.6). 

Consider  now  a  new  network  derived  from  the  previous  one  by  deleting 
link  a  ,  and  all  the  sessions  traversing  it.  We  now  only  update  the  capa¬ 
city  of  link  ae£  at  iteration  k  to  be 


ca  -lj- 


Using  this  new  network  and  the  Corollary  to  Lemma  (3. A),  we  can  show  the 

convergence  of  the  rates  of  session  traversing  link  a  which  satisfies 

2 
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i 


a„  = 


=  arg  mm 
aek  a*a 


3a(ca  - 


1 


Tim  sup 
k>» 


Fa) 

V* 


This  procedure  will  exhaust  all  the  links  in  the  network  and  since  each 
session  traverses  at  least  one  link,  we  have  therefore  proved  that  the 
rates  of  all  sessions  converge. 

To  see  that  the  vector  (. . .  ,f^^) , . . .)'  is  a  fair  allocation  over  the 

s 

set  defined  by  (3.3.6M3.3.8) ,  notice  that  by  the  existence  of  the  limit 
in  (3.5.11)  we  get  equality  in  (3.5.5)  and  therefore  by  (3.5.7)  we  get 


ft^Tt)  >  ga  [ca  -  l  YUlp(a1)],  (3.5.13) 

1  1  ueS  u 

-1  t 

with  equality  for  teS  such  that  l0(a]_)=l.  Thus  ft  (y  ),  for  teS  such  that 

t  -It 

l_(al y=l,  maximizes  the  minimal  value  of  ft  (y  )  for  all  teS  when  y 
t 

satisfies  (3.3.6).  Since  y  is  easily  seen  to  belong  to  the  set  defined  by 
( 3.3.6)-( 3.3.8) ,  it  therefore  solves  the  first  problem  in  the  hierarchy  of 
problems  that  results  in  a  vector  ( . . .  ,f^(  y5) , . . .) '  of  fair  allocation  over 
this  set.  Likewise  inductively  we  can  show  that  it  solves  the  second 
problem  in  the  hierarchy  etc.,  and  the  Proposition  is  proved. 

Q.E.D. 


3.6  Implementation  Issues 

The  iteration  (3.4.2)  is  stated  in  a  centralized  context,  whereby  all 
the  needed  quantities  are  simultaneously  available  everywhere.  Certainly 
this  is  not  the  case  in  a  geographically  distributed  network.  In  this 
section  we  modify  the  iteration  in  a  way  that  renders  it  suitable  for 
distributed  implementation. 
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Pictorially  we  look  at  each  link  and  each  origin  of  a  session  as  a  pro¬ 
cessor.  These  processors  implement  the  algorithm  by  exchanging  messages 
between  themselves.  We  assume  that  each  link  can  send  messages  to  all  ori¬ 
gins  sessions  that  traverse  it.  The  order  of  messages  on  a  path  is  pre¬ 

served.  To  avoid  the  question  of  how  the  rate  is  measured,  we  assume  we 
are  dealing  with  a  computational  algorithm  whereby  the  rate  ys  is  a 
variable  determined  by  the  ori gi n-processor  responsible  for  session  seS. 

The  distributed  version  of  iteration  (3.4.2)  will  determine  how  this 
variable  is  changed. 

The  key  to  the  distributed  version  of  the  algorithm  is  the  observation 
that  Lemma  (3. A)  holds  for  each  link  without  any  regard  to  whether  other 
links  exist  at  all  or  participate  in  the  algorithm.  With  this  in  mind,  it 
is  easy  to  rework  the  proof  of  Proposition  (3. A)  to  see  that  a  sufficient 
condition  for  the  convergence  of  the  algorithm  is  that  each  link  "performs" 
(in  a  sense  that  will  be  clear  shortly)  step  (3.4.2)  infinitely  often. 

Thus  we  propose  the  following  model. 

Each  link  ae£  keeps  a  list  of  local  variables  -ys  for  all  sessions  seS 

d 

that  traverse  it  (together  with  the  functions  f  and  ga).  An  origin  of 

session  s  keeps  a  list  of  variables  7s  for  all  links  a  on  the  path  p  .  We 

a  s 

assume  that  by  a  certain  internal  rule  a  link  ae£  "awakes"  in  certain  times 
such  that  the  "awake"  occurs  infinitely  often.  Each  time  a  link  ae£ 

"awakes"  it  performs,  subject  to  conditions  permitting  it,  a  "link  message" 
procedure.  Similarly,  an  origin  to  session  seS  "awakes"  once  in  a  while 
and  performs  a  "node  message"  procedure.  A  link  which  receives  a  message 
from  an  origin  node  performs  a  "link  update"  procedure  and  an  origin  to  a 
session  which  receives  a  message  from  a  link  performs  a  "node  update"  pro¬ 
cedure.  The  details  of  the  procedures  are: 
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Link  Message:  (link  "awakes") 


1.  To  each  session  s  traversing  a,  link  a  sends  a  message  containing  the 
quantity 

Ta  +  aa[fs9a<ca  “  L  TaVa))  "  ^ 

ueS  u 

where  y!  is  the  value  of  the  correspondi ng  local  variable  at  the  time  of 

a 

a  s 

the  "awake"  and  a  is  as  in  (3.4.3)  or  (3.4.4)  with  ya  replacing  y|<. 

2.  A  new  "awake"  is  aborted  until  a  "node  message"  is  received  from  all 
origins  traversing  a.  For  each  origin  the  "node  message"  should  come  after 
an  acknowledgement  to  the  message  sent  in  1  is  received. 

Node  Message:  (origin  of  session  s  "awakes") 

1.  yS  +  min  y* 

a  s.t.  1  { a) =1 

«.  s 

2.  y  is  sent  to  all  links  ae£  on  the  path  pg. 

Link  Update;  (message  from  origin  s  arrives  at  link  a) 

1.  Y^-value  of  the  message. 

2.  Send  an  acknowledgement. 

Node  Update:  (message  from  link  a  arrives  at  origin  of  session  s) 

1.  y^  ♦  value  of  the  message. 

2.  Send  an  acknowledgement. 

It  is  not  difficult  to  ascertain  from  the  observation  made  above  that 
this  distributed  procedure  will  result  in  values  of  ys>  that  will  converge 
to  the  value  that  would  have  been  produced  had  a  centralized  algorithm 
been  used. 

A  careful  examination  of  all  possible  sequencing  of  events  that  may 
occur  in  the  network  during  the  implementation  of  the  algorithm  reveals 
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that  the  sum 


is  not  guaranteed  not  to  exceed  at  all  times.  In  the  way  the  algorithm 

is  stated  the  above  sum  can  be  proved  not  to  exceed  capacity  only  at 

moments  of  unaborted  "awakes".  The  problem  is  that  after  a  "link  message" 

is  performed,  those  sessions  that  are  allowed  by  the  link  to  increase  might 

respond  before  those  that  are  instructed  to  decrease.  This  by  itself  will 

not  cause  a  problem  since  it  can  be  shown  that  any  combination  of  ys  at 

a 

"awake"  time  and  the  values  sent  to  the  origins  by  the  link  does  not  exceed 
the  capacity.  But  in  between  the  link's  "awake"  and  the  response  of  the 
origins,  sessions  that  are  instructed  to  decrease,  can  in  the  meantime 
increase  on  the  behalf  of  actions  taken  by  other  links.  This  is  a  very 
rare  but  still  possible  occurance. 

The  remedy  is  to  modify  the  "link  message"  procedure  by  sequencing  the 
messages  sent  in  step  1  properly.  Notice  that  at  the  beginning  of  each 
"awake"  the  values  are  upper  bounded  by  the  values  sent  at  the  previous 
"awake".  If  we  first  send  the  messages  whose  values  are  below  the  corres¬ 
ponding  values  above  and  then  wait  for  the  appropriate  node  "awake",  we 
preclude  the  possibility  mentioned  in  the  previous  paragraph.  The  exact 
statement  of  the  modified  "link  message"  procedure  can  be  easily  worked 
out. 

An  alternative  remedy  is  to  abort  a  "node  message"  procedure  until  a 

node  which  is  an  origin  to  session  s,  has  heard  from  all  the  links  on  p  , 

s 

since  it  last  performed  the  procedure. 

The  first  remedy  is  cumbersome;  the  second  remedy  might  slow  down  the 
algorithm. 
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4  Data  Routing 


4.1  Introduction 

In  this  chapter  we  consider  an  iterative  method  for  the  solution  of  the 
multicommodity  flow  routing  problem.  We  are  mainly  interested  in  an  itera¬ 
tive  method  which  converges  fast  and  facilitates  distributed  computation. 
These  two  characteri sties  are  the  key  to  a  successful  quasi-static  algorithm 
for  routing  messages  in  a  computer  communication  network. 

The  fast  convergence  allows  a  static  routing  algorithm,  aimed  at 
achieving  an  optimal  flow  pattern  when  inputs  are  fixed,  to  be  used  when 
inputs  are  changing.  If  the  convergence  is  fast  and  the  variations  in  the 
inputs  are  slow,  the  algorithm  will  result  in  an  instantaneous  flow  pattern 
which  is  close  to  the  optimal  flow  pattern  with  respect  to  the  instan¬ 
taneous  inputs. 

The  distributed  computation  is  particularly  appealing  in  computer  com¬ 
munication  networks  where  each  node  is  a  computer.  It  facilitates  a  speedup 
of  the  algorithm,  it  increases  the  survivability  of  the  algorithm  when 
nodes  are  subject  to  failure  and  it  increases  the  modularity  of  the  net¬ 
work.  In  particular  when  distributing  computation  in  an  environment  of 
changing  inputs,  the  algorithm  should  not  utilize  parameters  which  are  sen¬ 
sitive  to  those  changes.  This  means  for  instance  the  use  of  a  constant 
stepsize  which  preserves  the  fast  rate  of  convergence  for  a  broad  spectrum 
of  inputs. 

The  need  to  achieve  fast  convergence,  coupled  with  the  need  to  utilize 
constant  stepsize  applicable  to  a  broad  spectrum  of  inputs,  make  Newton's 
method  a  top  candidate,  since  it  converges  superl inearly  and  employs  step- 
size  of  unity  for  any  input.  The  superl inear  convergence  when  unity  step 
is  employed,  is  guaranteed  only  when  the  algorithm  starts  near  the  optimum, 


but  this  is  exactly  the  situation  in  which  the  quasi  static  algorithm  is 
operati  ng. 

The  above  consideration  led  to  a  research  effort  ([11],  [23])  aimed  at 
imitating  Newton's  method  while  facilitating  distributed  computation. 
Unfortunately,  in  all  imitations  the  distributed  computation  was  achieved 
at  the  expense  of  the  deterioration  of  the  convergence  rate  from  superli- 
near  to  linear. 

Most  of  the  algorithms  suggested  in  ([11],  [23])  can  be  cast  as 
variations  of  a  projected  Newton  step  in  which  the  Hessian  of  the  objective 
function  was  approximated  by  a  diagonal  matrix.  Two  reasons  accounted  for 
the  need  of  a  diagonal  approximation.  The  first  was  the  difficulty  of  com¬ 
puting  the  off  diagonal  terms.  The  second,  and  the  more  serious  dif¬ 
ficulty,  was  that  of  distributing  the  computation  associated  with  the  pro¬ 
jection,  once  the  approximation  to  the  Hessian  was  not  diagonal  . 

In  the  sections  that  follow  we  propose  a  way  to  overcome  the  above  two 
difficulties.  In  Sections  4.2  -  4.6  we  take  up  the  problem  of  approxi¬ 
mating  the  Hessian  by  a  nondiagonal  approximation  while  at  the  same  time 
projecting  with  respect  to  a  diagonal  matrix,  thus  facilitating  the 
distributed  computation  of  the  projection.  It  is  shown  that  the  non¬ 
diagonal  approximation  can  be  taken  in  such  a  way  as  to  preserve  the 
superl inear  rate  of  convergence  in  spite  of  the  fact  that  the  projection  is 
made  with  respect  to  a  diagonal  matrix.  These  sections  are  cast  within  a 
rather  general  framework  and  by  themselves  constitute  a  contribution  to  the 
nonlinear  programming  methodology. 

In  Section  4.7,  the  first  difficulty,  that  of  computing  the  nondiagonal 
elements  of  the  Hessian  is  considered.  The  difficulty  encountered  with 
those  elements  stems  from  the  space  of  routing  variables  used  in 
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[11], [23].  This  space,  of  fractions  decomposed  by  destinations  as  pre¬ 
sented  first  in  [10],  gives  rise  to  a  nonlinear  transformation  from  the 
routing  variables,  namely,  the  fractions,  to  the  link  flows.  We  propose 
the  space  of  path  flows  in  which  the  transformation  from  the  routing 
variables,  namely,  the  path  flows,  to  the  link  flows  is  linear.  In  this 
space  the  Hessian  obtains  a  particularly  simple  structure  which  facilitates 
distributed  computation. 

The  distributed  computation  facilitated  is  that  of  a  solution  of  a 
system  of  linear  equation  built  around  the  Hessian.  We  use  the 
Conjugate-Gradient  method  to  solve  this  system  and  show  that  all  the  opera¬ 
tions  involved  can  be  decomposed  according  to  the  nodes  in  the  network,  in 
a  way  that  does  not  require  any  node  to  know  the  global  topology  of  the 
network . 

The  advantages  gained  by  considering  the  routing  problem  in  the  space 
of  path  flows  may  by  itself  be  a  good  reason  for  the  use  of  these  variables 
in  a  computer  communication  network,  instead  of  fractions.  Moreover,  in 
Chapter  5,  which  deals  with  data  flow  control  we  give  an  additional  impetus 
to  this  change  of  variables.  This  is  in  the  spirit  of  this  thesis,  that 
of  employing  compatible  algorithms  for  the  various  tasks  associated  with 
the  delivery  of  a  packet  from  an  origin  to  a  destination. 

We  conclude  in  Section  4.7  with  a  computational  example  of  the  multi- 
commodity  flow  problem  in  the  space  of  path  flows. 
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4.2  Two  Metric  Projection  Method 


Projection  methods  stemming  from  the  original  proposal  of  Gold¬ 
stein  [1],  and  Levitin  and  Pol jak  [2]  are  often  very  useful  for  solving 
the  problem 

minimize  f(x) 


(4.2.1) 


subject  to  xeX 


where  f:H-*R  and  X  is  a  convex  subset  of  a  linear  space  H.  These  methods 
take  the  form 

Vi =  Vxk  •  Vk’  ,4-2-21 


where  is  a  positive  scalar  stepsize,  P^(0  denotes  projection  on  X 

with  respect  to  some  Hilbert  space  norm  n.n  on  H  and  g  denotes  the 

k  k 

Frechet  derivative  of  f  with  respect  to  M,  ,  i.e.,  g  is  the  vector 

k  k 

in  H  satisfying 


f ( x )  =  f(x^)  +  <gk , x-xk>k  +  o(lx-x  l). 


(4.2.3) 


where  <•»•>  denotes  the  inner  product  corresponding  to  ii.ii  . 

x  k 


As  an  example  let  H=Rn ,  and  be  an  nxn  positive  definite  symmetric 
matrix.  Consider  the  inner  product  and  norm  corresponding  to  B 


<x,y>k  =  x'Bky, 


,  1/2 

I x tik  =  (<x,x>)  ,  T  x.yeH, 


(4.2.4) 


where  all  vectors  above  are  considered  to  be  column  vectors  and  prime 
denotes  transposition.  With  respect  to  this  norm  we  have  [cf .(4.2.3) ] 

9k  =  8k*  7f(xk}’  (4.2.5) 

where  vf ( x^ )  is  the  vector  of  first  partial  derivatives  of  f 
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7f(V  ■ 


(4.2.6) 


3f,Xk> 

3X1 

• 

3f(x) 

k 


When  problem  (4.2.1)  is  unconstrained  (X  =  H),  iteration  (4.2.2)  takes  the 

familiar  form 

x.  =  x  -  a  B-1vf(x  ). 
k+1  k  k  k  k 


Otherwise  the  vector 


Vl  =  Pk(xk  “  “kV 

is  the  solution  of  the  problem 

minimize  ux  -  x  +  a  g  u2 
k  k  k  k 

subject  to  xeX. 

A  straightforward  computation  using  (4.2.4)  and  (4.2.5)  shows  that  the  problem 
above  is  equivalent  to  the  problem 

minimize  Vf(xk)'(x  -  x^)  +  (x_xk^  ’B^x-x^)  (4.2.7) 

subject  to  xeX. 


When  X  is  a  polyhedral  set  and  B^  is  a  Quasi-Newton  approximation  of 
the  Hessian  of  f,  the  resulting  method  is  closely  related  to  recursive 
quadratic  programming  methods  which  currently  enjoy  a  great  deal  of 
popularity  (e.g.,  Garcia-Palomares  [3],  Gill  et  al  [4]). 

It  is  generally  recognized  that  in  order  for  methods  of  this  type  to  be 
effective  it  is  essential  that  the  computational  overhead  for  solving  the 
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quadratic  programming  problem  (4.2.7)  should  not  be  excessive.  For  large- 

scale  problems  this  will  be  so  only  if  the  matrix  3  is  chosen  in  a  way 

that  matches  the  structure  of  the  constraint  set.  For  example  if  X  is 

m 

the  Cartesian  product  n  X  0f  m  simpler  sets  X,,  the  matrix  Bk  should 

i  =  l  1  * 

be  block  diagonal  with  one  block  corresponding  to  each  set  X  in  which 

i  ’ 

case  the  projection  problem  (4.2.7)  decomposes  naturally.  Unfortunately  such 
a  choice  of  precludes  the  possibility  of  superliner  convergence  of  trig 
algorithm  which  typically  cannot  be  achieved  unless  B  is  chosen  to  be  a 

l\ 

suitable  approximation  of  the  Hessian  matrix  of  f  ([3],  [5]). 

In  this  Chapter  we  propose  projection  methods  of  the  form 

xk+i  =  p(*k  “  (4.2.8) 

where  the  norms  n • n  and  a . n  corresponding  to  the  projection  and  the 

l\ 

differentiation  operators  respectively  can  be  different.  This  allows  the 
option  to  choose  a.ii  to  match  the  structure  of  X,  thereby  making  the 
projection  operation  computationally  efficient,  while  reserving  the  option 
to  choose  il. u  on  the  basis  of  second  derivatives  of  f  thereby  making  the 
algorithm  capable  of  superlinear  convergence.  When  H  =  Rn,  the  projection 
norm  n.n  is  the  standard  Euclidean  norm 

“XII  =  (x'x)1^2  =1  x  I  ,  (4.2.9) 

and  the  derivative  norm  |.i  is  specified  by  an  nxn  positive  definite 
symmetric  matrix 

llx"k  =  (4.2.10) 

the  vector  x^+^  of  (4.2.8)  is  obtained  by  solving  the  quadratic  programming 
subproblem 


(4.2.11) 


1  2 

minimize  gR ’  { x  -  x^  +  .^1-1  x  -  xk  I 
subject  to  x  e  X 

where 

gk  =  B”1vf(x|().  (4.2.12) 

The  quadratic  programming  problem  (4.2.11)  may  be  very  easy  to  solve  if 
X  has  special  structure.  As  an  example  consider  the  case  of  an  orthant 
constre'i  nt 

X  =  { x  I  0  <  x1 ,  i  =  1, . . . ,n) .  (4.2.13) 

Then,  the  iteration  takes  the  form 

Vl  ■  -  \Bkl7f(X|c)1+  (4-2-141 

where  for  any  vector  veRn  with  coordinates  v  ,  i  «  1 . n  we  denote  by  v+ 

the  vector  with  coordinates 

•  ,  i 

(v1 )  =  max  (0,v  }. 

Iteration  (4.2.14)  was  first  proposed  in  Bertsekas  [6],  and  served  as  the 

starting  point  for  the  present  paper.  It  is  not  true  in  general  that  for 

an  arbitrary  positive  definite  choice  B  ,  iteration  (4.2.14)  is  a  descent 

k 

iteration  [in  the  sense  that  if  xk  is  not  a  critical  point  of  f  over  the 

set  X  of  (4.2.13)  then  for  a,  sufficiently  small  we  have  f(x  )  <  f(x  )]. 

x  k+i  k 

Indeed  this  is  the  main  difficulty  in  constructing  two-metric  extersions 
of  the  Goldstein-Levi tin-Pol jak  method.  It  was  shown,  however  in  [6]  that 
if  B^  is  chosen  to  be  partially  diagonal  with  respe-t  to  a  suitable  sub¬ 
set  of  coordinates  then  (4.2.14)  becomes  a  descent  iteration.  We  give  a  non¬ 
trivial  extension  of  this  result  in  the  next  section  (Proposition  ( 4 . A) ) . 
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(he  construction  of  the  "scaled  gradient"  g|(  satisfying  the  descent  con 
di  tion 


<9k, Vf(xk)>  0  (4.2.15) 

is  based  on  a  decomposition  of  the  negative  gradient  into  two  orthogonal 
components  by  projection  on  an  appropriate  pair  of  cones  that  are  dual  to 
each  other.  One  of  the  two  components  is  then  "scaled"  by  multiplication 
with  a  positive  definite  self-adjoint  operator  (which  may  incorporate 
second  derivative  information)  and  added  to  the  first  component  to  yield 
gk«  The  method  of  construction  Is  such  that  g^,  in  addition  to  (4.2.15), 
also  satisfies 

f[P(xk  -  agk)J  <  f(Xk) 

for  all  a  in  an  interval  (O.a,  ],  "a  >0. 

k  k 

Section  4.4  describes  the  main  algorithm  and  proves  its  convergence. 
While  other  stepsize  rules  are  possible,  we  restrict  attention  to  an 
Armijo-like  stepsize  rule  for  selecting  on  the  arc 

{zl  z  =  P(xk  -  agk),  a>0 } 

which  is  patterned  after  similar  rules  proposed  in  Bertsekas  [6],  [7]. 
Variations  of  the  basic  algorithm  are  considered  in  Section  4.6,  while  in 
Section  4.5  we  consider  rate  of  convergence  aspects  of  algorithm  (4.2.8), 
(4.2.11),  (4.2.12)  as  applied  to  finite  dimensional  problems.  We  show  that 
the  descent  direction  gk  can  be  constructed  on  the  basis  of  second  deriva¬ 
tives  of  f  so  that  the  method  has  a  typically  superl inear  rate  of  con¬ 
vergence.  Here  we  restrict  attention  to  Newton-like  versions  of  the 
algorithm.  Quasi -Newton,  and  approximate  Newton  implementations  based  on 
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successive  overrelaxation  or  conjugate  gradient  methods  are  possible.  In 
fact  a  superlinearly  convergent  conjugate  gradient-based  implementation  of 
the  method  is  applied  to  large-scale  multicommodity  flow  problems  in 
Section  4.7. 

While  the  algorithm  is  stated  and  analyzed  in  general  terms  we  pay 
special  attention  to  the  case  where  X  is  a  finite  dimensional  polyhedral 
set  with  a  decomposable  structure  since  we  believe  that  this  is  the  case 
where  the  algorithm  is  most  likely  to  find  application. 


64 


4.3  The  Algorithmic  Map  and  Its  Descent  Properties 
Consider  the  problem 

minimize  f(x)  (4.3.1) 

subject  to  xeX 

where  f  is  a  real -valued  function  on  a  Hilbert  space  H,  and  X  is  a  non¬ 
empty,  closed,  convex  subset  of  H.  The  inner  product  and  norm  on  H  will  be 
denoted  by  <.,.>  and  u.u  respectively.  We  say  that  two  vectors  x,  yeH 
are  orthogonal  if  <x,y>  =  0.  For  any  zeH  we  denote  by  P(z)  the  unique 
projection  of  z  on  X,  i.e., 

P(z)  =  arg  min  { u x  -  zul  xeX}.  (4.3.2) 

We  assume  that  f  is  continuously  Frechet  differentiable  on  H.  The  Frechet 
derivative  at  a  vector  x  e  H  will  be  denoted  by  vf(x).  It  is  the  unique 
vector  in  H  satisfying 

f(z)  =  f ( x)  +  <vf( x) , z  -  x>  +  o(uz  -  XII) 

where  o(ilz  -  xii)/iiz  -  xi  +  0  as  z  +  x.  We  say  that  a  vector  x*e  X  is 
critical  with  respect  to  problem  (16)  if 

*  * 

<Vf(x  ),x  -  x  >  >  0,  T  xeX,  (4.3.3) 

or  equivalently,  if  x*  =  P[x*  -  vf(x*)j  for  all  a  >  0. 

It  will  be  convenient  for  our  purposes  to  represent  the  set  X  as  an 
intersection  of  half  spaces 

X  =  { x  |  <ai,x>  <  bi;,  T  i  eI  } ,  (4.3.4) 

where  lisa,  possibly  infinite,  index  set  and,  for  each  iel,  a.  is  a 
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nonzero  vector  in  H  and  b_j  is  a  scalar.  For  each  closed  convex  set  X  there 
exists  at  least  one  such  representation.  We  will  assume  that  the  set  I 
is  nonempty  -  the  case  where  I  is  empty  corresponds  to  an  unconstrained 
problem  which  is  not  the  subject  of  this  paper.  Our  algorithm  wi  1 1  be 
defined  in  terms  of  a  specific  collection  {(a.  ,1^ )  1  i el}  satisfying  (4.3.4) 
which  will  be  assumed  given.  This  is  not  an  important  restriction  for 
many  problems  of  interest  including,  of  course,  the  case  where  X  is  a 
polyhedron  in  Rn. 

We  now  describe  the  algorithmic  mapping  on  which  our  method  is  based. 

For  a  given  vector  xeX  we  will  define  an  arc  of  points  |x(a)  I  a  >  0}  which 

depends  on  an  index  set  Ic  I  and  an  operator  D  which  will  be  described 

x  x 

further  shortly.  The  index  set  I  is  required  to  satisfy 

Ixo  {i el  |  <ai  ,x>  >  bj  -  eaai  l }  (4.3.5) 

where  e  is  some  positive  scalar.  Let  C  be  the  cone  defined  by 

Cx  =  {z  (  <ai ,z>  <  0,  Y  i e I x }  (4.3.6) 

and  C+  be  the  dual  cone  of  C 
x  x 

cx  =  {z\  <y,Z>  <  0,  -V-  yeCx}.  (4.3.7) 

For  orientation  purposes  we  mention  that  if  X  is  a  polyhedral  subset  of 

Rn  (or  more  generally  if  the  index  set  I  is  finite),  and  e  is  sufficiently 

small,  then  I  can  consist  of  the  indexes  of  the  active  constraints  at  x, 
x 

i.e.,  we  may  take  I  =  { i  I  <a.,x>  =  b. ,  iel}.  In  that  case  C  is  the  cone 

All  A 

of  feasible  directions  at  x,  while  C*  is  the  cone  generated  by  the  vectors 
a.  correspoding  to  the  active  constraints  at  x.  More  generally  C  is  a 

1  X 
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(possibly  empty)  subset  of  the  set  of  feasible  directions  at  x,  and  for  any 
AxeC^  with  iaxh  <  e  the  vector  x  +  ax  belongs  to  X. 

Let  d  be  the  projection  of  [-vf(x)]  on  C  ,  i.e., 

X  X 

dx  =  arg  min  {lz  +  vf(x)  I  j  zeCx}.  (4.3.8) 

Define 

d*  =  -  [ vf(x)  +  dx].  (4.3.9) 

It  can  be  easily  seen  that  the  vectors  d  and  d+  are  orthogonal  and  that 

xx 

dx  is  the  projection  of  [-7f(x)j  on  C*  ,  i.e., 

dx  =  arg  min  { uz  +  vf(x) I  I  zeC*} .  (4.3.10) 

Note  that  if  the  norm  u*u  on  H  is  such  that  projection  on  the  set  X  is 
relatively  simple  then  typically  the  same  is  true  for  the  projection  (4.3.8), 
required  to  compute  d  and  d+. 

X  X 

Let  r  be  the  subspace  spanned  by  the  elements  of  C  which  are  orthog- 
x  x 

onal  to  d+,  i.e., 
x 

rx  =  span  iCxnizl  <z>dx>  =  °H-  (4.3.11) 

Note  that 

dxerx  (4.3.12) 

since  dx  belongs  to  Cx  and  is  orthogonal  to  d^.  Let  Dx:rx-*-rx  be  a 

positive  definite  self-adjoint  operator  mapping  rx  into  itself.  Consider 

.  .  ~  + 

the  projection  dx  of  Dxdx  on  the  closed  cone  Cxf)|zkz,d  >  =  0},  i.e. 

rw 

dx  =  arg  min  { uz-Dxdxu  I  ze  Cx,  <z,d^>  =  0).  (4.3.13) 
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Consider  also  the  "direction"  vector 


g  =  -(d*  +  dx).  (4.3.14) 

Given  x,  I  and  D^,  our  algorithm  chooses  the  next  point  along  the  arc 

x(a)  =  P(x  -  ag),  a  >  0.  (4.3.15) 

The  stepsize  a  will  be  chosen  by  an  Armijo-like  stepsize  rule  that  will  be 
described  in  the  next  section. 

The  process  by  means  of  which  the  direction  g  is  obtained  is  illu¬ 
strated  in  Figures  4.1  -  4.4  The  crucial  fact  that  will  be  shown  in 
Proposition  (4. A)  below  is  that,  if  x  is  not  critical,  then  for  sufficiently 
small  a  >  0  we  have  f[x(a)]  <  f(x),  i.e.,  by  moving  along  the  arc  x(a)  of 
(4.3.15)  we  can  decrease  the  value  of  the  objective.  Furthermore  we  have 
<vf(x),g>  0  which  means  that  g  can  be  viewed  as  a  "scaled"  gradient,  i.e., 

the  product  of  vf(x)  with  a  positive  definite  self-adjoint  operator. 

We  now  demonstrate  the  process  of  calculating  the  direction  g  for  some 
Interesting  specially  structured  constraint  sets. 

Example  (4. A);  Let  H  =  Rn,  <x,y>  =  x'y,  and  X  be  the  positive  orthant 

X  =  { x  I  x1  >  0,  i  =  l,...,n}. 

Then  X  consists  of  the  intersection  of  the  n  half spaces  { x I  x1  >0} 

i  =  l,...,n  and  is  of  the  form  (4.3.4).  The  set  I  must  contain  all  indices 

x 

i  such  that  0  <  x1  <  e  [cf.  (4.3.5)].  The  cones  C  and  C+  are  given  by 

X  X 

Cx  =  { z  1  z1  >  0,  T  i  e I x } ,  cx  =  { z  J  z  <  0,  T  ielx,  =  0  T  j£Ix} 
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(4.3.8),  (4.3.9)]  have  coordinates  given  by 

Wx 

i  el 

x 

f  0  if  UI 

x 

X  < 

-  iH*i  if  i el 
ax1  x 

where 

I  =  {1  I  lei  and  >  o}. 

ax1 


The  vector  d  ,  and  d+  f  cf . 

x  x  L 


af(x) 


if 


d1 


ax 

o  if 


If  I  is  empty  then  r  *  Rn  and  we  have  d  =  -vf(x),  d+  =  0.  In 
*  x  xx 

this  case  g  =  -D  d  =  D  7f(x)  where  D  is  any  nxn  positive  definite 
xxx  x 

symmetric  matrix.  If  I  is  not  empty,  by  rearranging  indices  if  necessary 

A 

assume  that  for  some  Integer  p  with  0  <  p  <;  n-1  we  have  I  =  (p+1 . n}. 

X 

Partition  vf(x)  as 


vf(x) 


w 


where  w  e  RP  and  w  e  Rn"P.  The  vector  g  is  given  by 


9  = 


(Dxw)* 

w 


73 


n  ~  # 

where  is  a  pxp  positive  definite  symmetric  matrix,  (Dxw)  denotes  pro- 
jection  of  D^w  on  Cx,  i.e.  (Dxw)  is  obtained  from  Dxw  by  setting  to  zero 
those  coordinates  of  D  w  which  are  negative  and  their  indices  belong  to  IY. 

A  A 


Example  2.  Let  H  =  Rn,  and  X  be  the  unit  simplex 


i 

x  =  {xl  l  x,  =  1,  x  >  0,  i  =  1, .. .  ,n} .  (4.3.16) 

i  =  l 


Suppose  the  inner  product  on  Rn  is  taken  to  be 


n  . 

<x,y>  =  l  sVy1  (4.3.17) 

i  =  l 

where  s1 ,  i  =  l,...,n  are  some  positive  scalars.  Let  I  be  a  set  of 

x  — 

indices  including  those  indices  i  such  that  0  <  x1  <  e/  s1  .  Then  the 

cone  C  can  be  taken  to  be 
x 

Cx  =  {z  |  l  z,  =  0,  z1  >  0,  T  iel„}.  (4.3.18) 

I  .  =  1  l  x 


The  vector  d  is  obtained  as  the  solution  of  the  projection  problem 

X 

minimize  |  l  [z  + -j  -^p-]2  (4.3.19) 

i=l  s  3x 


in  .  ^ 

subject  to  l  z1  =  0,  z  >  0,  iel. 

i  =  l  x 

The  solution  of  this  problem  is  very  simple.  By  introducing  a  Lagrange 

n 

multiplier  \  for  the  equality  constraint  l  z1  =  0  we  obtain  that  X  is  the 

i=l 

solution  of  the  simple  piecewise  linear  equation 


L7 

iel  s 
X 


r.  3f(x)+  .  1 

-  — rJ  +  L  t 

3X  S 

X 


u  - 


3f(x) 


3X 


J  =  0 


(4.3.20) 
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Once  x  is  obtained  the  coordinates  of  d^  are  given  by 


dx  =<l 


1  ,  3f(x)  +  ...A 

—  [X - —]  if  lei 

S1  3X1  } 


1  ,  3f(x)  ,  .  ,A 

7[x~ 1 


The  vector  d*  is  then  obtained  from  the  equation 
4*  >  -  Ivflx)  +  d  ]. 


Let 


Ix  =  {1  |  lei  and  x  <  . 

3x 

It  is  easily  verified  that  the  subspace  r  is  given  by 

X 

n  i  i 

rx  =  {  z  |  l  2  =  0,  Z  =  0,  T  ieTx}. 


(4.3.21) 


(4.3.22) 


(4.3.23) 


The  vector  dx  is  obtained  as  the  solution  of  the  simple  projection  problem 

n 

minimize  *  Y  s1  [z1  -  (D  d  l1’]2  (4.3.24) 

2  1-1  *  * 

^  i  i  ^  j  ~ 

subject  to  l  z  =  0,  z  >  0,  Y  ielx,  z  =  0  T  jelx 

i=l 

where  (D  d  )]  is  the  ith  coordinate  of  the  vector  D  d  obtained  by 
x  x  xx 

multiplying  d  with  an  nxn  symmetric  matrix  D  which  maps  r  into  r 
x  xxx 

and  is  positive  definite  on  r  .  We  will  comment  further  on  the  choice  of 

x 

in  the  last  section  of  the  paper.  The  vector  g  is  given  now  by 
g  =  -( 3T  +  d*).  Note  that  the  solution  of  both  projection  problems  (4.3.19) 
and  (4.3.24),  as  well  as  the  problem  of  projection  on  the  simplex  X  of 
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(4.3.16)  is  greatly  simplified  by  the  choice  of  the  "diagonal"  metric  spe¬ 
cified  by  (4.3.17) . 

Proposition  (4. A)  below  is  the  main  result  regarding  the  algorithmic 
map  specified  by  (4.3.5)  -  (4.3.9),  (4.3.13)  -  (4.3.15).  For  its  proof  we 
will  need  the  following  lemma  the  proof  of  which  is  given  in  Appendix  (4. A). 
Lemma  (4. A) :  Let  fl  be  a  closed  convex  subset  of  a  Hilbert  space  H,  and  let 
PM  denote  projection  on  fl.  For  every  xen  and  zeH: 
a)  The  function  h:(0,»)  +  R  defined  by 


h(.)  - az)  - x" 
a 


¥  a  >  0 


is  monotonically  nonincreasing. 

b)  If  y  is  any  direction  of  recession  of  n  [i.e.,  (x  +  o y)en  for  all  a  >  0], 
then 

<y,x  +  z>  <  <y,P0(x  +  z)>.  (4.3.5) 


Proposition  (4. A):  For  xeX,  let  e  >  0  and  I  satisfy  (4.3.5),  and  let 

X 

D  :r  +  r  be  a  positive  definite  self-adjoint  operator  on  the  subspace 

XX  X 

rx  defined  by  (4.3.6)  -  (4.3.11).  Consider  the  an:  { x( ot)  I  a  >  0}  defined 
by  (4.3.8),  (4.3.9),  (4.3.13)  -  (4.3.15). 


a)  If  x  is  critical,  then 

x( a)  =  x,  T  a  >  0. 

b)  If  x  is  not  critical,  then 

<Vf(x)  ,g>  =-  0,  (4.3.26) 

and 

<Vf(x)  ,X-x{  a)>  >  a  «ix,Dxdx>  lx(a)  -  (x+adx)l|2>  0  T  ae(0,-jf|j) . 

(4.3.27) 
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Furthermore  there  exists  It  >  0  such  that 

f ( x)  >  f  [ x( a)  ] ,  -y-  ae(0,Tt].  (4.3.28) 

Proof:  a)  It  is  easily  seen  that  for  every  zeC^  we  have 

(x+17fZ)ex  (4.3.29) 

in  view  of  the  definitions  (4.3.4)  -  (4.3.6).  Since  x  is  critical  we  have 
<vf(x),y-x>  >  0  for  all  yeX.  Therefore  using  (4.3.29)  we  have 

<Vf(x),z>  >0,  t  ZeCx.  (4.3.30) 

From  the  definitions  of  C+,  d  and  d+  [cf.  (4.3.6)  -  (4.3.9)]  and  (4.3.30) 

XXX  J 

it  follows  that 

-  vf(x)eCx 
and 

d+  =  -  vf ( x ) ,  d  =  0. 
x  x 

Using  (4.3.13)  -  (4.3.15)  we  obtain  x(a)  =  P[x  -  aVf(x)].  Since  x  is  criti¬ 
cal  we  have  that  x  =  P[x  -  aVf(x)]  for  all  a  >  0  and  the  conclusion  follows. 

b)  We  have  using  the  facts  vf(x)  =  -(d  +  d+)  and  <cf  ,d+>  =  0 

xx  xx 

<dx,vf(x)>  =  -  <dx,dx  +  dx>  =  -  <dx,dx>.  (4.3.31) 

Now  dx  is  the  projection  of  D  d  on  the  cone  C  0  {z  1  <z,dY>  =  0],  and  d 

is  a  direction  of  recession  of  this  cone  since  it  belongs  to  it.  Therefore 
from  Lemma  (4. A)  part  b)  we  obtain 

<dx,dx>  >  <dx,Dxdx>.  (4.3.32) 
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Combining  (4.3,31)  and  (4.3.32)  we  obtain 

<dx,Vf(x)>  <  -  <dx,Dxdx>  <  0  (4.3.33) 

where  the  second  inequality  is  strict  if  and  only  if  d  ^  o.  Also  d+  is 

X  X 

the  projection  of  -  Vf(x)  on  C+,  so 

X 

<d+,Vf(x)>  <  0  (4.3.34) 

x 

with  strict  inequality  if  and  only  if  d+  *  0.  Combining  (4.3.33)  and 

X 

(4.3.34)  and  using  the  fact  g  =  -  (dx  +  dx),  we  obtain 

<g,Vf(x)>  >  0  (4.3.35) 

with  equality  if  and  only  if  d  =  0  and  d+  =  0,  or,  equivalently  vf ( x)  =  0. 
Since  x  is  not  critical  we  must  have  Vf(x)  *  0,  so  strict  inequality  holds 
in  (4.3.35)  and  (4.3.26)  is  proved. 

Take  any  ae(0,  j|^-) .  Since  projection  on  a  closed  convex  set  is  a 
nonexpansive  operator  (see  e.g.  [8]),  we  have 

u x( a)  -  XII  <  ax  -  ag  -  xa  =  alga  <  e  (4.3.36) 

Therefore  we  have 

<a-j  ,x>  «=  b]  -  euaiii<bi  -  <a-j,x(a)  -  x>,  T  i£Ix 
and 

<ai  ,x(  a)>  c  b^ ,  T  i^Ix. 

It  follows  that  x(o)  is  also  the  projection  of  the  vector  x  -  ag  on  the 

set  a  X  given  by 
x 

flx  =  {zl  <ai  ,z>  <  bi ,  ielx}. 
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x(a)  =  arg  min  { 12  -  (x  -  ag)i  |  zefl  }. 


(4.3.37) 


i  .e. , 


Now  the  vector  is  easily  seen  to  be  a  direction  of  recession  of  the  set 
,  so  by  Lemma  (4. A)  part  b)  we  have 


<dx,x(a)>  >  <dx,x  -  ag>  =  <dx,x  +  adx  adx> . 


Since  <d  ,d+>  =  0,  the  relation  above  is  written  by  using  also  (4.3.32) 

X  X 


-<d  ,x  -  x(a)>  >  a<d  ,D  d  >.  (4.3.38) 

X  XXX 


In  view  of  the  fact  dYeCv  we  have  (x  +  ad  left  ,  and  since  x(a)  is  the 

A  A  XX 

projection  on  £2X  of  (x  +  adx  +  <»dx)  [cf.  (4.3.37)]  we  have 


"T  ~ 

<x  +  adx  +  adx  -  x(a),x  +  adx  -  x(a)>  <  0. 


Equivalently,  using  the  fact  <dx,dx>  =  0, 

+  ax(a)  -  (x  +  adx)»2 

-<dx,x  -  x(a)>  >  ~  .  (4.3.39) 


By  combining  (4.3.38)  and  (4.3.39)  and  using  the  fact  Vf(x)  =  -(d  + 

x 

d+)  we  obtain 

~  2 

Bx( a)  -  (x  +  adx)  It 

< V f ( x ) ,x  -  x(a)>  >  a<dx,Dxdx>  + - - - (4.3.40) 


which  is  the  left  inequality  in  (4.3.27).  To  show  that  the  right  side  of 

(4.3.40)  cannot  be  zero,  note  that  if  it  were  then  we  would  have  both  d  =  0 

x 

(implying  *  0,  x(a)  =  x  -  aVf(x))  and  x(a)  =  x  +  [implying 
p  (x  -  aVf (x j /  =  xj.  Since  x  is  not  critical,  we  arrive  at  a  contradic¬ 
tion.  Therefore  the  right  inequality  in  (4.3.27)  is  also  proved. 

By  using  the  mean  value  theorem  we  have 
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f(x)  -  f [x( a) ]  =  <Vf(x),x  -  x(  a) >  +  <vf(;a)  -  Vf(x),x  -  x(  a)>  (4.3.41) 


where  ;a  lies  on  the  line  segment  joining  x  and  x(a).  Using  (4.3.40)  and 

(4.3.41)  we  obtain  for  all  ae(0t— Lr) 

Tg  fl 

~  2 

l  ix(a)  -  (x  +  od  )  n 

a  { f ( x)  -  f [x( a) ] }  >  <dx,Dxdx>  + - 2 -  (4.3.42) 

a 

+  <f(;  )  -  vf(x), 

a  a 

Using  (4.3.36)  and  the  Cauchy-Schwartz  inequality  we  see  that 

<vfUa)  -  vf(x),  --"-aX-(  — >  >  -  uvfUa)  -  vf ( x) » • u g u .  (4.3.43) 

Since  uvf(ca)  -  vf(x)a+0  as  a  +  0  we  see  from  (4.3.42)  and  (4.3.43)  that 
if  d^  ^  0  then  for  all  positive  but  sufficiently  small  a  we  have  f(x)  > 

»v 

f [ x(  a) ] .  If  dx  =  0  then  dx  =  0  and  using  Lemma  (4. A)  part  a) 

~  2  2 

ilx(a)  -  (x  +  ad  )  n  nx(a)  -  x»  2 

- 2 - =  - 2 -  >  »y.(l)  -  XS  ,  T  ae(0,l  ] .  (4.3.44) 

a  a 


From  (4.3.42),  (4.3.43)  and  (4.3.44)  we  see  again  that  when  d  =  0  then 

x 

for  all  positive  but  sufficiently  small  a  we  have  f(x)  >  f [x( a) ] . 

Therefore,  there  exists  a>0  such  that  (4.3.28)  holds  in  both  cases  where 

d  =  0  and  d  t  0. 
x  x 

Q.E.D. 
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4.4  Convergence  Analysis 


The  previous  section  has  shown  how  a  vector  xeX,  a  scalar  e>0,  an  index 
set  1^  satisfying 

IxCD{ieII  <a,j  ,x>  >  b|  -  eia^ll}, 

and  a  positive  definite  self-adjoint  operator  D  :r  +rY  where  r  is  the 

xxx  X 

subspace  defined  by  (4.3.6)  -  (4.3.11),  uniquely  define  an  arc  of  points 
x(a)eX,  a  >  0  where 

x( a)  =  P(x  -  ag) ,  a  >  0 


and  g  is  defined  via  (4.3.8),  (4.3.9),  (4.3.13)  -  (4.3.15).  Furthermore 

for  each  xeX  which  is  not  critical.  Proposition  lb)  shows  that  by  choosing 

a  sufficiently  small  we  can  obtain  a  point  of  lower  cost  on  this  arc. 

Therefore  any  procedure  that,  for  any  given  xeX,  chooses  I  ,  e,  and 

x 

Dx  satisfying  the  above  requirements,  coupled  with  a  rule  for  selecting  a 
point  of  lower  cost  on  the  corresponding  arc  x( a)  leads  to  a  descent 
algorithm.  There  is  a  large  variety  of  possibilities  along  these  lines  but 
we  will  focus  attention  on  the  following  broad  class  of  methods: 

We  assume  that  we  are  given  a  continuous  function  e:X-*-R  such  that 


e(x)  >  0, 
e(x)  =  0 


YxeX 

x  is  critical 


(4.4.1) 

(4.4.2) 


(for  example  e(x)  =  min  {e,»x  -  P[x-  vf ( x) J i }  where  e  >  0  is  a  given 
constant).  We  are  also  given  scalars  6e  (0,1),  oe  (0,1/2),  x  >  0  and 
>  0  with  <  x^. 
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At  the  beginning  of  the  kth  iteration  of  the  algorithm  we  have  a  vector 


xkeX*  xk  stationary  we  set  xk+1  =  x^.  Else  we  obtain  the 
vector  x^+l  as  follows: 

Step  1:  Choose  an  index  set  IQ  I  satisfying 

!k  {i£I  I  <ai  >xk>  >  bi  -  e  { xjc )  a  a  i  a } » 
and  compute 

dk  =  arg  min  { uz  +  Vf ( xfc )  l  I  zeC^} 
dx  ■  -  l7f(xk>  +  «k] 

where 

ck  =  (z  I  <a.j  ,z>  <  0,  i e I k } . 

Step  2:  Choose  a  positive  definite  self-adjoint  operator  D  :r  >r 

k  k 

rk  =  span  {Ck  0  {z  I  <z,d£>  =  0}}. 

and  D.  satisfies 
k 

n  2 
^k11  <  x2>  and  Ai a z a  <  <z,D|cz>,  T  zer^ 

Compute  dk  given  by 

3k  =  arg  min  { a z  -  Dkdk a • zeCk »  <z»dk>  =  °1* 

Define 

% 3  +  ak> 

and 

Xk(a)  =  P(xk  -  agk),  T  a>0. 


next 

(4.4.3) 

(4.4.4) 

(4.4.5) 

(4.4.6) 

,  where 

(4.4.7) 

(4.4.8) 

(4.4.9) 

(4.4.10) 

(4.4.11) 
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Step  3:  Set 


xk+l  =  xk(ak) 

where 

\  *  ^ 

and  is  the  first  nonnegative  integer  m  satisfying 

f(xk)  "  >  a  l0"1  <dk,Dkdk> 

s  Xk ( 0m)  -  (xk  +  0mdk ) 8 2 

em 

Proposition  (4. A)  part  b)  shows  that  x^+1  is  well  defined  via  the  step- 
size  rule  (4.4.12)  -  (4.4.14)  in  the  sense  that  is  a  (finite)  integer 
and  furthermore 

«*»> >  «W 

for  all  k  for  which  x^  is  not  critical.  The  following  proposition  is  our 
main  convergence  result. 

Proposition  (4.B):  Every  limit  point  of  a  sequence  {x  }  generated  by  the 

k 

algorithm  above  is  a  critical  point. 

Proof:  Let  { x  }  be  a  subsequence  of  jx  }  converging  to  a  points  which  is 
k  K  k 

not  critical.  We  will  arrive  at  a  contradiction.  Since  {c^}  is  bounded  we 
assume  without  loss  of  generality  that 

lim  a  =  cT 

k-n»  k 

keK 

where  ae[0,lj.  Since  { f ( x,  ) }  decreases  monotonically  to  f(7)  it  follows 

< 

from  the  form  of  the  stepsize  rule  that 


(4.4.12) 

(4.4.13) 

(4.4.14) 
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(4.4.15) 


Tim 

k+® 

keK 


\  <dk*Dxdx>  =  0 


1  im 
k*® 
keK 


"W  ~  lxk  *  vV"2 

ok 


=  0 


(4.4.16) 


We  consider  two  cases: 

Case  1  (oT  >  0) :  It  follows  from  (4.4.15)  and  the  fact  <d,  ,D.  d.  >  id  n2 
-  k  k  k  lk 

(cf.  (4.4.8))  that  1  im  d  =  0,  and  therefore  also  lim  3"  =  0,  lim  d+  =  vf(T). 

k-f»  k  k-*-®  k  k>®  k 

keK  keK  keK 

By  taking  limit  as  k+®,  keK,  in  the  equation  =  P(xk  +  c^dj  +  o)fdk), 

using  the  coninuity  of  the  P  operator  which  follows  because  P  is  nonexpan- 
si  ve  we  obtain 

lim  x  (a  )  =  P[7  -  liffx)  ]. 

k+on  k  k 

keK 

Therefore  (75)  yields 


x  =  P[x  -  aVf ( x )  ] . 


Since  "a  >  0  this  implies  that  7  is  critical  thereby  contradicting  our 
earlier  assumption. 

Case  2  (a=0) :  It  follows  that  for  all  keK  which  are  sufficiently  large 


f<V 


nviri] 


<  a 


<dk 


IX| 


•Dkdk>  + 


k(-) 


(Xk 


+  jA’ 


k 

T 


},  (4.4.17) 


i.e.,  the  test  (4.4.14)  of  the  stepsize  rule  will  be  failed  at  least  once 
for  all  keK  sufficiently  large. 
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Since  g^  =  -(d*  +  d^),  <c*^>3k>  =  0,  we  have 


2  +2  -  2 
ugka  =  »dka  +  ndkn  . 


(4.4.18) 


Since  dk  is  the  projection  of  on  (\  {z  \  <z,dk>  =  0}  we  must  have 

^  s*w 

"V  <  “Dkdk 11  and»  us1  n9  (4.4.8) ,  adku  <  X2adka.  Therefore  from  (4.4.18) 

and  the  fact  id+ii  <  «vf(x  )ii,  id  u  <  avf(x  )i  we  obtain 
k  k  k  k 


ag  a2  <  (1  +  X2)  a 7f( x  )«2. 
K  2  k 


It  follows  that 


1  im  sup  ag.  a  <  °°. 
k>«  K 


(4.4.19) 


We  also  have 


lim  e(x  )  *  eCx)  >  0. 
k*o»  k 

keK 


(4.4.20) 


It  follows  from  (4.4.19),  (4.14.20)  and  the  fact  a  =  0  that  for  all  keK 

%  ^ 

sufficiently  large  _  e(0,  _ L_)  and  therefore  using  Proposition  (4. A) 

^  ®  9k  ^ 

part  b)  [cf.  (4.3.27)]  we  obtain 

a  a  sx  (_k)  -  (x  +  _kff  )  II2 

<Vf(x  ),x  -  x  (_k)>  >  _k  <d  ,D  d  >  +  k  g _ k  g  k  .  (4.4.21) 

kk  kg  g  kkk  ^ 


Using  the  mean  value  theorem  we  have 


f ( x  )  -  f  [x  (Ji)  j  =  <vf(x  )  ,x  -  x  (i>  + 
k  kg  k  k  kg 


(4.4.22) 
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+  <vfUk)  -  Vf(xk),xk  -  Xk(— )> 

tt 

where  ck  lies  on  the  line  segment  connecting  xk  and  xk(-jp).  From  (4.4.17), 
(4.4.21),  and  (4.4.22)  we  obtain  for  all  keK  sufficiently  large 


(1-a)  { <dk ,Dkdk>  + 


,xk(r' 


(\)2 


x  -  x  ( JL) 

<  <vf ( x  )  -  vfU, ),  k  k  g  >. 

k  k  afc 


(4.4.23) 


Since  [cf.  (4.3.36) ,(4.4. 19)  ]  we  have 


lim  sup 
k*°° 
keK 


ii  x,  -  xt  &  n 
k  k  0 


lim  sup  ug  ii  <  ® 
k+°®  k 

keK 


and  lim  uvf(x  )  -  7f(c  )«  =  0  it  follows  that  the  right  side  of  (4.4.23)  tends 
k+<»  k  k 

keK 

to  zero  as  k-*°°,  keK.  Therefore  so  does  the  left  side  which  implies  that 


lim  dk  =  0, 
k-n» 


lim  d,  =  0 
k+«  K 


(4.4.24) 


ii  x.  {.^)  -  (x  +  )  »2 

kb  k  b  k 

iV 


(4.4.25) 
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ft 


L< 


Since  it  follows  from  (4.4.20)  and  (4.4.24)  that  there  exists  7  such  that 


x  +  _!i  eX  f  k  >  T( 

k  3  k 


we  obtain  using  Lemma  la) 


ak  2 

<*k(r>  -  <xk  +rdk)' 


ak  2 

(r) 


>  »Pr'x.  +  )  +  d+] 

k  3  k  k J 


(x  +  )«2. 

k  3  k 


(4.4.26) 


From  (4.4.25)  and  (4.4.26)  it  follows  that 


ak  ~ 

lim  uP[(xk  +  J-  dk) 


k+oo 

keK 


a|<  ~  2 

(Vf(xk)  +  dk)j  -  (xk  +  f  dk)l  =  0. 


Using  (4.4.24)  we  obtain 

iP[x  -  vf(7)  J  -  7a  =  0 

which  contradicts  the  assumption  that  7  is  not  critical.  Q.E.D.  . 

We  mention  that  some  of  the  requirements  on  the  sequences  (e(x^)}  and 
{ }  can  be  relaxed  without  affc-cting  the  result  of  Proposition  ( 4. B ) .  In 
place  of  continuity  of  e(*)  and  assumption  (4.4.8)  it  is  sufficient  to  require 
that  if  (x  }  is  a  subsequence  converging  to  a  noncritical  point  7,  then 

K  )\ 


lim  inf  e(x.  )  >  0, 
k-»~  k 
keK 

lim  inf  inf  {<z,D  z> 
k~  k 

keK 

1  im  sup  « D  II  <  «. 

k+oo  k 

keK 


l  zu  = 


=  1,  zeTk}  >  0 
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I 


This  can  be  verified  by  inspection  of  the  proof  of  Proposition  (4.B). 

A  practically  important  generalization  of  the  algorithm  results  if  we 
allow  the  norm  on  the  Hilbert  space  H  to  change  from  one  iteration  to  the 
next.  By  this  we  mean  that  at  each  iteration  k  a  new  inner  product  <•»•>. 

and  corresponding  norm  n*«k  on  H  are  considered.  The  statement  of  the 
algorithm  and  corresponding  assumptions  must  be  modified  as  follows: 

a)  The  gradient  vf(x^)  will  be  with  respect  to  the  current  inner 
product  <•>•>  [cf.  (4.2.3)]. 

b)  The  projection  defining  dk,  dk,  dk  and  the  arc  xkM  should  be  with 
respect  to  the  current  norm  c*»k. 

c)  The  assumptions  on  I  ,  and  D  ,  and  the  stepsize  rule  should  be  restated 

N  l\ 

in  terms  of  the  current  inner  product  and  norm. 

There  is  no  difficulty  in  reworking  the  proof  of  Proposition  (4.B)  for 
this  generalized  version  of  the  algorithm  provided  we  assume  that  all  the 
norms  u*ii^,  k  =  0,  1,...  are  “equivalent"  to  the  original  norm  ii»ii  on  H  in 
the  sense  that  for  some  m  >  0  and  M  >  0  we  have 

miizii  <  y zu k  <  M a z ii ,  T  zeH,  k  =  0,1,...  . 

Naturally  the  norms  i»ik  should  be  such  that  projection  on  X  witn  respect 
to  any  one  of  them  is  relatively  easy  for  otherwise  the  purpose  of  the 
methodology  of  this  paper  is  defeated.  The  motivation  for  considering  a 
different  inner  product  at  each  iteration  stems  from  the  fact  that  it  is 
often  desirable  in  nonlinear  programming  algorithms  to  introduce  iteration- 
dependent  scaling  on  the  optimization  variables.  This  is  sometimes 
referred  to  as  "preconditioning".  The  use  of  the  operator  Dk  fulfills  that 
need  to  a  great  extent  but  while  this  operator  scales  the  component  d  of 

X 

the  negative  gradient,  it  does  not  affect  at  all  the  second  component  d+. 
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The  role  of  an  iteration-dependent  norm  can  be  understood  by  considering 
situations  where  the  index  set  I  is  so  large  that  the  cone  C  is  empty. 

K  K 

In  this  case  =  -  vf ( ) ,  d^  =  0  and  the  kth  iteration  reduces  to  an 
iteration  of  the  original  Goldstein-Levitin-Pol jak  method,  for  which 
practical  experience  shows  that  simple,  for  example  diagonal,  scaling  at 
each  iteration  can  sometimes  result  in  spectacular  computational  savings. 


4.5  Rate  of  Convergence 


In  this  section  we  will  analyze  the  rate  of  convergence  of  algortihm 
(4.4.3)  -  (4.4.14)  for  the  case  where  X  is  polyhedral  and  H  is  finite  dimen¬ 
sional.  An  important  property  of  the  Gol dstein-Levi tin-Pol jak  method 
[cf.  (4.1.7)]  is  that  if  it  generates  a  sequence  {x^}  converging  to  a 
strict  local  minimum  7  satisfying  certain  sufficiency  conditions  (compare 
with  [7]),  then  after  some  index  7  the  vectors  x^  lie  on  the  manifold  of 

active  constraints  at  7,  i.e.,  x.  e7  +  N  where 

k  x 

=  {z I <a^  ,z>  =  0,  V  1  eA^j  (4.5.1) 

and  where 

A  =  { i  lid,  <a^,7>  s  b^).  (4.5.2) 

Our  algorithm  preserves  this  important  characteristic.  Indeed,  we  will  see 
that,  under  mild  assumptions,  our  algorithm  "identifies"  the  set  of  active 
constraints  at  the  limit  point  in  a  finite  number  of  iterations,  and  sub¬ 
sequently  reduces  to  an  unconstrained  optimization  method  on  this  subspace. 
This  brings  to  bear  the  rate  of  convergence  results  available  from 
unconstrained  optimization. 

The  rate  of  convergence  analysis  will  be  carried  out  under  the 
following  assumptions: 

(4. A)  H  is  finite  dimensional,  X  is  polyhedral,  f  is  continuously  Frechet 
differentiable,  and  vf  is  Lipschitz  continuous  on  bounded  sets,  i.e.,  for 
every  bounded  set  there  exists  L  >  0  such  that  for  every  x  and  y  in  the  set 
we  have 


i  vf(x)  -  vf (y )  u  <  Lux  -  yti . 


(1 ,5,3) 
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(4.B)  x  is  a  strict  local  minimum  and  there  exists  6>0  such  that 

P(y)ex  +  -V-  y  such  that  fx  -  vf{x)  -  y«  <  6  (4.5.4) 

(4.C)  the  function  e(x)  in  the  algorithm  has  the  form 

e(x)  =  min  (e,Ix  -  P[x  -  7f (x) ] I }  (4.5.5) 

where  e>0  is  a  given  scalar.  Furhtermore  the  set  I,  in  the  algorithm  is 

k 

chosen  to  be  [cf.  (4.4.3)] 

Ik  =  {iel|<a1,x)(>  >  b.  -  e(xk)  la.  a}.  (4.5.6) 

The  Lipschitz  condition  (4.5.3)  is  satisfied  in  particular  if  f  is 
twice  continously  differentiable.  Condition  (4.5.4)  is  a  weakened  version 
of  an  often  employed  regularity  and  strict  complementarity  assumption  which 
requires  that  the  set  of  vectors  {a.  |ieA_$  is  linearly  independent  and  all 
Lagrange  multipliers  corresponding  to  the  active  constraints  are  strictly 
positive.  The  form  (4.5.5)  for  e(x)  is  required  for  technical  purposes  in 
our  subsequent  proof.  The  reader  can  verify  that  there  are  other  forms  of 
e(x)  that  are  equally  suitable.  Finally  the  choice  (4.5.5)  for  the  set 
1^  is  natural  and  is  ordinarily  the  one  that  is  best  for  algorithmic 
purposes. 

the  follwoing  proposition  allows  us  to  transfer  rate  of  convergence 

results  from  unconstrained  minimization  to  alforithm  (4.4.2)  -  (4.4.14). 

Proposition  (4.C):  Let  x  be  a  limit  point  of  the  sequence  {x  }  generated 

k 

by  iteration  (4.4.3)  -  (4.4.14),  and  let  Assumptions  (4. A)  -  (4.C)  hold. 

Then 

lim  x  =  x 
k+®  k 
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and  there  exists  k  such  that  for  all  k  >  k  we  have 


xRex  +  N  (4  5.7) 

rk  =  span{Ck  {z|<z,dk>  =  0}}  =  ,  (4.5.8) 

d  =  arg  min[«vf(x  )  +  zlljzeN  },  (4.5.9) 

xk  +■  1  =  xk  +  ak^k^k '  (4.5.10) 

mk 

where  a.  =  3  and  m,  is  the  first  nonnegative  integer  m  for  which 

K  K 

f(xk)  -  f[xk(0m)]  >  cr0m  <dk.Dkdk>.  (4.5.11) 

The  proof  of  Proposition  (4„C)  is  given  in  Appendix  (4.B).  From 

(4.5.10)  and  (4.5.11)  we  see  that  eventually  the  method  reduces  to  an 

unconstrained  minimization  method  on  the  manifold  7  +  N^.,  The  proposition 

shows  that  if  the  matrix  Dk  is  chosen  so  that  for  all  k  sufficiently  large 

it  is  equal  to  the  inverse  Hessian  of  f  restricted  on  the  manifold  7  +  N_ 

x 

then  the  method  essentially  reduces  to  the  unconstrained  Newton  method  and 
attains  a  superlinear  rate  of  convergence. 

4.6  Algorithmic  Variations 

Many  variations  on  iteration  (4.4.3)  -  (4.4.14)  are  possible.  One  of 
them,  changing  the  metric  on  the  Hilbert  space  H  from  iteration  to  itera¬ 
tion,  was  discussed  at  tne  end  of  Section  4.4.  In  this  section  we  discuss 
other  variations.  These  will  include  the  use,  in  various  cases,  of  a 
pseudometric  on  H  instead  of  a  metric,  variations  on  the  step  size  rules 
and  finally  variations  on  the  various  projections  in  (4.4.3)  -  (4.4.14). 

We  will  state  the  variations  without  a  convergence  proof.  In  each  case, 
the  reworking  of  the  proofs  of  Sections  4.3  -  4.4  to  show  that  the 

variation  is  valid,  poses  no  difficulty. 
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4.6.1  Singular  Transformation  of  Variables  through  a  Pseudometric 

Here  we  address  the  case  where  X  is  not  a  solid  body  in  H,  i.e.  for 
some  linear  manifold  M  we  have  XCH  *  H.  In  this  case  we  observe  that 
(4.3.27)  is  the  only  place  where  a  metric  as  opposed  to  a  pseudometric  is 
needed.  Noticing  that  if  XCM,  then  all  quantities  in  (4.3.27)  belong  to 
M,  one  can  conclude  that  all  that  is  necessary  is  to  have  a  metric  on  M. 

This  leads  us  to  consider  the  use  of  pseudometric  on  H  provided  it  induces 
a  metric  on  M.  Furthermore,  we  can  change  the  pseudometric  on  H  from 
iteration  to  iteration,  as  we  can  change  the  metric,  provided  that  the 
metrics  induced  on  M  are  equivalent  in  the  sense  described  in  Section  4.4. 

The  introduction  of  a  pseudometric  serves  to  facilitate  the  projection 
further.  In  Example  (4.C)  below  we  rework  example  (4.B)  using  a  pseudometric. 
Specifically  we  assume  that  for  some  Te{l,...,n)  [cf.  (4.3.17)]  s1  is 

n  i 

zero.  The  resulting  pseudometric  (4.3.17)  restricted  to  M  =  (x l Y  xn=0} 

i  =  l 

induces  a  metric  on  M.  We  define  projections  on  M  taking  limits  in  Example 
(4.B)  as  s1  approaches  zero.  Although  the  gradient  of  f  with  respect  to 
the  pseudometric  does  not  exist,  the  limit  of  the  projection  of  vf ( x )  on  M 
as  s1  approaches  zero  does  exit  and  it  is  the  unique  vector  q(x)eM 
satisfying 


f(z)  =  f ( x )  +  q(x)'S(z  -  x)  +  Oil z  -  x#  T  zeM,  T  xeM. 

where  S  is  a  diagonal  matrix  with  s1  in  its  (i,i)  position. 

1  1 
Example  4.C:  Let  s  in  Example  (4.B)  be  zero.  By  taking  limit  as  s  -*-0 

we  derive  that  _ 

Ix  =  [HI,  or  0  <  x1  <  e/  y ^  T  i*i} . 

In  order  to  find  dx  we  have  to  take  the  limit  in  (4.3.20).  There  is  one 
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case,  and  for  our  purposes  an  Important  one,  in  which  this  limit  is 
readily  obtained.  Assume  without  loss  of  generality  that  i  satisfies 


Then 


T  =  arg  mi n  ^3f(xH . 
ie{l . n)  3X1 


1  /  3  f  ( X )  3  f  (  X ) 


.1 


i  ax1 


d1  = 

X 


ax 


l  /af(x)  af 


V 


fil  7  l  ,  T  axj 


i*i 


1-T, 


and 


i.  - 


1  Lief  and  ifUI<iIU) 

1  x  t 

d*  ax 


It  can  be  seen  using  the  definitions  that 


~  A 

I  =  I  -  T 
x  x 

and  therefore  among  all  choices  of  T,  the  one  made  above  results  in  the 

<v 

maximal ity  of  Ix. 

The  advantage  of  such  a  pseudometric  is  easily  manifested,  once  it  is 
realized  that  by  using  it  we  dispense  with  the  need  to  solve  the  piecewise 
linear  equations  in  example  (4.B)  [cf. (4.3.19) ,  (4.3.24)].  On  the  other 
hand  examples  involving  a  quadratic  objective  can  be  constructed  in  which, 
when  using  a  metric  the  algorithm  converges  in  one  step,  while  with  the  use 
of  a  pseudometric  it  takes  several  steps.  This  will  happen  when  v2f  is 
diagonal  positive  definite. 
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4.6.2  Step  Size  Rules 


The  Armijo-like  rule  (4.4.14)  can  be  viewed  as  combination  of  the 

» 

Armijo  rule  used  in  unconstained  minimization  [9],  and  an  Armijo  like  rule 
for  constrained  optimization  proposed  by  Bertsekas  in  [[7],  cf.  eq  (12)]. 
Corresponding  to  an  alternate  suggestion  made  there  [[7],  cf.  eq  (22)]  we 
can  replace  (4.4.14)  by 

f(xk)  -  f(xk(0m))  >  (4.6.1) 

> 

a{0m  <d|< ,0kdk>  +  <Vf(xk),(xk  +  8mdk)  -  xk ( 6m) > } . 

Also,  a  variation  of  the  Goldstein  step  size  rule  [9]  can  be  employed,  in 

which  a  <  0.5  and  a  is  chosen  such  that  * 

(1  -  o)  {a<dk,Dkdk>  +  <7f(xk)>(xk  +  a^k)  ‘  x|<(a)>}  > 
f(xk)  "  f(xk(°0)  >  o  {a<d^ ,D^d^>  + 

<vf{xk),(xk +  aV  ■  xk(a)>}*  (4.6.2) 

The  rule  (4.6.2)  is  the  counterpart  of  (4.6.1).  The  reader  can  easily 

construct  the  counterpart  to  (4.4.14). 

4.6.3  Variations  on  the  Projections 

There  is  one  central  observation, namely ,  the  projections  of  D  d  and 

k  k 

dk  on  any  closed  convex  set  for  which  dk  is  a  direction  of  recession, 
result  in  descent  directions.  By  employing  different  sets  with  this  pro¬ 
perty,  variations  on  the  algorithm  result  since  different  directions  may  be 
obtained  and  different  arcs  may  be  searched. 

The  first  variation  is  to  replace  C  in  (4.4.9)  by  (a  -  x  ),  i.e. 

k  k  k 

~  , 

d k  =  arg  min  { a z  -  Dkdkn  |  zeQk  -  xk»  <'z»c*k'>  =  (4.6.3) 
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where 


a.  =  { z i  <a.  ,z>  <  b. ,  Y  iel.  }. 
k  1  i  i  k j 

Evidently 

ak  ~  xk  Ck 

and  as  a  result  ^  is  a  direction  of  recession  of  ak  -  xk,  which  implies 
that  3^  defined  by  (4.6.3)  is  a  descent  direction. 

Interestingly,  this  variation  gives  rise  to  a  variation  in  the  stepsize 
search.  Since  the  set  { z I  zeC.  ,  <z,d*>  =  0}  is  a  cone,  it  is  easily 

*>  l\ 

verified  that 

ad  =  arg  min  {AaDkdk  -  zilzeCk,  <z,dk>  =  0}. 

Thus.  (4.4.11)  can  be  interpreted  as 

Xk(a)  =  P[X|(  +  adk  +  qk(a)] 

where 


qk(a)  =  arg  min  {«aDkdk  -  zilzeCk,  <z,dk>  =  0}. 

Since  C  can  be  replaced  by  a  -  x  ,  a  new  algorithm  results  by  using  the 
k  k  k 

arc 

xk ( a)  =  P[xk  +  adk  +  qk(a)] 

where 


qk(a)  =  arg  min  {«aDkdk  -  zii  I  zeftk  -  xk,  <z,dk>  =  0}. 

Indeed,  the  particular  algorithm  suggested  in  [6]  can  be  considered  to  be 
an  implementation  of  the  last  variation  for  an  orthant  constraint. 
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4.7.  Multicommodity  Network  Flow  Problems 


» 


4.7.1  Application  of  the  Algorithm  to  the  Multicommodity  Flow  Problem 
__  _  , 

In  this  last  section  we  apply  algorithm  (4.4.3) -( 4.4. 14)  to  a  classical 
nonlinear  multicommodity  network  flow  problem  and  present  some  com¬ 
putational  results. 

i 

We  consider  a  network  consisting  of  N  nodes,  1,  2 . N,  and  a  set  of 

directed  links  denoted  by  £.  We  assume  that  the  network  is  connected  in 
the  sense  that  for  any  two  nodes  m,  n,  there  is  a  directed  path  from  m 
to  n.  We  are  given  a  set  W  of  ordered  node  pairs  referred  to  as 
origin-destination  (or  OD)  pairs.  For  each  OD  pair  weW,  we  are  given 
a  set  of  directed  paths  P  that  begin  at  the  origin  node  and  terminate 

W  r 

at  the  destination  node.  For  each  weW  we  are  also  given  a  positive  scalar 

r  referred  to  as  the  input  of  OD  pair  w.  This  input  must  be  optimally 
w 

divided  among  the  paths  in  P  so  as  to  minimize  a  certain  objective  func-  f 

w 

tion. 

For  every  path  peP  corresponding  to  an  OD  pair  weW  we  denote  by 

w 

xP  the  flow  travelling  on  p.  These  flows  must  satisfy  , 

,  P 

l  x  =  rw  T  weW  (4.7.1) 

pePw 

xP  >  0  T  pePw,  weW  (4.7.2) 

Equations  (4.7.1),  (4.7.2)  define  the  constraint  set  of  the  optimization 
problem  -  a  Cartesian  product  of  simplices. 

In  Examples  (4.B)  and  (4.C)  we  discussed  the  application  of  our  method 
to  the  case  of  a  simplex  constraint.  It  is  not  difficult  to  see  that  if  we 
take  a  "diagonal"  metric  on  the  space,  the  multicommodity  flow  problem 
decomposes  in  the  sense  explained  below. 
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Let.  x  denote  the  vector  of  variables  y 3,  peP  ,  weW,  and  let  xw 

w 

n  W  W 

denote  the  vector  of  variables  xp,  pePw.  Let  Cx(x  )  and  rx(x  )  denote  the 
cone  and  subspace,  respectively,  in  R*w*,  generated  at  x,  when  all  variables 
aside  from  those  in  xw  are  considered  fixed  and  e  =  e(x).  Then 


we 


Vf ( X )  =  (...,v  wf(x),...) 

X 


and 


r 


X 


we 


r  (xw). 

^  A 


Thus  all  projections  decompose  and  therefore  in  many  respects  the  multicom¬ 
modity  flow  problem  is  not  different  from  the  problem  with  a  single  simplex 
constraint.  The  only  points  where  the  "interaction"  among  the  simplices 
appears  is  in  computing  e  ,  and  in  computing  D  d  . 

lx 

To  every  set  of  path  flows  {xplpsP  ,  weW}  satisfying  (4.7.1),  (4.7.2) 

w 

there  corresponds  a  flow  fa  for  every  link  ae£.  It  is  defined  by  the 
relation 

f3  =  l  I  lD(a)x?  T  ae£  (4.7.3) 

weW  pePw 

where  1  (a)  =  1  if  the  path  p  contains  the  link  a  and  1  (a)  =  0  otherwise. 

P  P 

If  we  denote  by  f  the  vector  of  link  flows  we  can  write  relation  (4.7.3))  as 
f  =  Ex  (4.7.4) 

where  E  is  the  arc-chain  matrix  of  the  network. 
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For  each  link  ae£  we  are  given  a  convex,  twice  continuously  differen¬ 
tiable  scalar  function  D  (fa)  with  strict  positive  second  derivative  for 

d 

all  fa  >  0.  The  objective  function  is  given  by 

For  each  link  ae£  we  are  given  a  convex,  twice  continuously  differen¬ 
tiable  scalar  function  D  (fa)  with  strict  positive  second  derivative  for 

a 

all  fa  >  0.  The  objective  function  is  given  by 

0(0  -  I  0  .(fa).  (4.7.5) 

ae£ 

By  using  (4.7.4)  we  can  write  the  problem  in  terms  of  the  path  flow 
variables  xP  as 


minimize  J(x)  =  D(Ex) 
subject  to: 


v  P 

i  X  =  rw  Y  weW 

pePw 

P 

x  *  0  Y  pePw,  weW. 

In  communication  network  applications  the  function  D  may  express,  for 
example,  average  delay  per  message  [10], [11]  or  a  flow  control  objective 
[12],  while  in  transportation  networks  it  may  arise  via  a  user  or  system 
optimization  principle  formulation  [13], [14], [15].  We  concentrate  on  the 
separable  form  of  0  given  by  (4.7.5),  although  what  follows  admits  an  exten¬ 
sion  to  the  non-separable  case. 

A  Newton-like  method  will  be  obtained  if  we  chose  0  d  so  that  x  + 

k  k  k 

D  d  is  the  minimum  of  the  quadratic  approximation  to  f  on  x  t  r  .  For 
K  x  k  k 

this  we  must  find  AY  where  Y  solves 


minimize  <VJ(xk),Av>  +  7  <Av,v  J(xjc)Av> 


(4.7.6) 


and  where  A  is  a  matrix  such  that  its  columns  are  linearly  independent  and 
span  r  . 

The  particular  structure  of  the  objective  function  (4.7.5)  gives  rise 
to  a  Hessian  matrix  which  makes  the  solution  of  (4.7.6)  relatively  easy  to 
obtain.  Indeed,  using  (4.7.5)  we  can  rewrite  (4.7.6)  as 


minimize  <E'vD(f  ),Av>  +  l/2<Av,E' 72D(f,  )EAv>,  (4.7.7) 

y  K  K 

where  f  =  Ex  .  A  key  fact  is  that  problem  (4.7.7)  can  be  solved  by  the 
k  k 

Conjugate  Gradient  (C-G)  method  using  graph  type  operations  and  the  fact 

that  v2D(f  )  is  diagnonal  without  explicitly  storing  the  matrix 
k 

A'E' V2D(fk)EA. 

Note  that  a  solution  to  (4.7.7)  exists  since  E ' VD( f ^ )  is  in  the  range  of  the 

2 

nonnegative  definite  matrix  E'V  D(fk)E. 

4.7.2  Implementation  of  the  C-G  Method 

A  scaled  version  of  the  C-G  method  solves  the  unconstrained  minimiza¬ 
tion  problem 

min  b'z  +  ^  z'Hz  (4.7.8) 

z 


as  follows: 

A  positive  definite  symmetric  matrix  S  is  chosen,  and  a  sequence  {z  } 

m 

is  generated  according  to  the  iteration 


zn  =  °»  z  i  =  z  +  y  P  » 
u  m+1  m  mm 


m  0,1,..., 


where  the  conjugate  direction  sequence  {p^}  is  given  recursively  by 
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pn  =  "Sr n’  Pm  =  “Sr  +  6  P  ,« 
u  0  m  m  m  in- 1 


the  residual  sequence  {r^}  is  defined  by 


r_  =  Hzm  +  b, 
m  m 


and  the  scalars  y  and  3  are  given  by 
m  m  =>  j 


r 1  Sr 
m  m 

Ym  =  p'Hp  > 
m  m 


r'Sr 
m  m 


m  r'  i  Sr, 


m-1  irm-l 


m  =  1,2, 


9 


m  ~  0)l|(i(j 


m  *  1  y  2  y  •  •  ■  * 


Applying  the  method  to  (4.7.7.)  with  the  inner  product  that  corresponds  to 
the  identity  matrix  and  S  =  A'SA  where  S  is  a  diagonal  matrix,  a  sequence 
{v  }  is  generated  according  to 


v0  =  0  v-i  =  v~  +  t_P_  » 


m+1  m  nf 


m 


m  “ 


p  =  -A'W  ,  p  =  -A'SAr  +pp  ,  m  =  1,2,..., 
u  urn  m  m  m-1 

rm  =  A'E'720(f JEAv  +  A'E'70(f,  ),  m  =  0,1 . 

»>  Ml 


Y 

m 


r'A'TAr 
_ m  m 

p'A'E' V^D(f.  )EAp 
m  k  m 


m  0,1,..., 


0 

m 


r '  A  'TAr 
m  m 


A'SArm-l 


m  =  1,2,... 


Since  we  are  interested  in  Av  we  can  work  directly  with  a  sequence  {Avm} 
which  will  be  given  by 
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I 


i 


Av  =  0  ,  (Av  )  =  (Av  )  +  y  p  ,  m  =  0,1 . 

u  m+1  m  m  m 

p  =  -(AA'jlr  p  =  -(AA')Sr  +  6  p  m  =  1,2,..., 

u  U  m  m  m  m-1 

rm  =  (AA')[E'72(fk)E(Avm)  t  E'VD<fk)]  m  =  0,1 . 

r"Sr 

m  m _ 

^  ‘  p;E'720(fk)Epm  m  =  0,1 . 


» 


» 


0 

m 


r*Tr 
m  m 


rm-l^rm-l 


m  =  1,2,... 


As  is  well  known  ([44],  [43])  this  method  will  find  the  solution  Av 

of  (4.7.7)  in  at  most  n  steps  where  n  is  the  dimension  of  r  ,  regardless 

k  k  k 

of  the  choice  of  “5.  Since  n  might  be  huge  we  are  primarily  interested  in 
an  approximate  implementation  whereby  only  a  few  C-G  iterations  of  the 
method  are  performed.  Under  these  circumstances  the  choice  of  S’  can  have  a 
substantial  effect  on  the  quality  of  the  final  solution.  A  suitable  choice 
is  to  take  S  to  be  the  inverse  to  the  diagonal  approximation  to 

AA,E'v2D(fk)EAA'. 

The  key  to  the  distributed  manner  and  the  facility  with  which  the  C-G 
steps  above  can  be  carried  out  is  the  interpretation  of  the  various  matrix 
multiplications  as  graph  type  operations.  Since  AA’  is  block  diagonal  with 
respect  to  0D  pairs  and  v20(f  )  is  diagonal  the  only  "real"  multiplication 

In 

is  by  E  and  E'.  Thus  for  example  to  compute 

E‘ V20(f  ) E(Av  ) 
k  m 


I 


I 


I 


I 
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(notice  that  E ' v20( f  ) Ep  can  be  derived  from  it)  we  interpret  (Av  )  as  a 

km  m 

vector  of  path  flows  and  then  EAv  is  the  corresponding  vector  of  link 

m  * 

flows;  likewise  we  interpret  V2D(f  )EAv  as  a  vector  of  link  delays  and 

k  m 

then  E'v2D(f  )EAv  is  the  corresponding  vector  of  path  delays.  Thus  if 
km 

nodes  and  links  are  viewed  as  processors  communicating  by  exchanging 

» 

messages.  The  multiplications  above  can  be  carried  out  by  sending  messages 
back  and  forth  along  paths.  The  other  multiplication  needed,  that  of  two 
vector  of  the  dimension  of  the  paths,  as  well  as  coordinating  the  end  and 
the  beginning  of  a  new  iteration  can  be  done  by  flooding  or  by  employment 
of  a  "leader"  node. 

As  the  reader  can  see, the  linearity  of  the  link  flows  in  the  path  flows  I 
is  essential  for  the  method.  This  precludes,  for  example,  the  use  of  frac¬ 
tions  as  routing  variables.  Yet,  working  in  the  space  of  path  flows  has 
its  price.  The  number  of  paths  can  be  exponentially  large.  We  propose  two  I 

remedies.  Both  are  not  attractive. 

The  standard  way  of  working  in  the  space  of  path  flows  is  to  maintain  a 
list  of  subset  of  the  paths,  those  that  currently  carry  flow,  appending  to  I 

it  at  the  end  of  each  iteration  the  path  of  the  smallest  marginal  delay. 

If  one  appends  a  new  path  only  if  the  improvement  in  the  marginal  delay, 
compared  to  the  current  minimal  one  in  the  subset,  is  above  some  threshold,  i 

then  it  looks  like  the  number  of  paths  used  will  be  reduced.  Alternative¬ 
ly,  one  can  work  with  a  restricted  a  priori  fixed  set  of  paths  for  each  00 
pai r.  i 

Overcoming  the  exponential  growth  of  the  number  of  paths  which  carry 
flow  (which  does  not  necessarily  happen  for  a  centralized  algorithm)  is  a 
topic  for  further  research. 

4.7.3  Computational  Results: 

A  version  of  the  algorithm  was  run  on  three  examples  of  the 
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I 


multicommodity  flow  problem.  The  networks  are  shown  in  Figures  4.5  -  4.7. 
Each  OD  pair  was  restricted  to  use  only  two  prespecified  paths.  This 
reduced  the  programming  load  significantly,  yet  captured  the  essence  of  the 
algorithm.  It  is  conjectured  that  the  results  we  obtained  are  represen¬ 
tative  of  the  behavior  of  the  algorithm  when  applied  to  more  complex  multi - 
commodity  flow  problems. 

Since  there  was  no  difficulty  in  deciding  whether  the  particular 
run  was  convergent  due  to  the  fact  that  superl inear  convergence  can  be 
detected  by  observing  the  reduction  of  the  cost  function,  we  decided  to 
implement  a  heuristic  version  of  the  algorithm  and  thereby  save  com¬ 
putational  overhead.  By  and  large,  we  have  tried  to  avoid  the  line  search. 
Without  it,  aside  from  minimal  coordination  in  doing  vector  multiplication 
and  the  graph  type  operations  mentioned  above,  the  algorithm  decomposes 
according  to  the  OIJ  pairs.  We  tried  various  simple  stopping  rules  for  the 
C-G  iteration.  We  have  settled  on  the  one  which  will  be  described  shortly. 

The  algorithm  was  operated  in  three  modes  distinguished  by  the  rules 
according  to  which  the  C-G  method  was  stopped.  In  the  first  mode  (denoted 
by  Newton)  the  C-G  iteration  was  run  to  the  exact  solution  of  problem 
(4.7.7).  In  the  second  mode,  (denoted  by  Approximate  Newton)  the  C-G  itera¬ 
tion  was  run  until  its  residual  was  reduced  by  a  factor  of  1/8  over  the 
starting  residual.  Finally,  in  the  third  mode  the  C-G  method  was  allowed 
to  perform  only  one  step  (to  be  denoted  by  1-step).  We  used  different 
values  for  different  OD  pairs,  according  to  a  variation  of  (4.4.1)  (with 
e  =  0.2). 

The  projection  metric  we  used  was  of  the  pseudometric  type  as  suggested 
in  Example  (4.C).  In  all  three  modes,  in  addition  to  their  particular 
stopping  rule,  the  C-G  method  was  stopped  whenever  for  any  OD  pair  w  the 
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flow  on  the  path  with  the  smaller  entry  in  the  gradient  became  negative. 

The  last  point  in  the  sequence  of  points  generated  by  the  stopped  C-G 
method  subiteration  was  connected  by  a  line  to  the  point  preceeding  it. 

The  point  on  the  line  at  which  the  particular  path  flow  became  zero  was 
taken  as  the  result  of  the  C-G  iteration. 

We  used  two  types  of  objective  functions.  The  first  is 

Da(f3)  = - -  Y  ae£ 

C  -  fa 

a 

where  C  is  a  given  positive  scalar  expressing  the  "capacity"  of  link  a. 

d 

This  function  is  typically  used  to  express  queuing  delay  in  communication 
networks  following  Kleinrock's  Independence  Assumption  [19].  The  second 
type  was  taken  to  be  quadratic.  We  used  two  sets  of  inputs,  one  to  simu¬ 
late  heavy  loading  and  one  to  simulate  light  loading.  For  each  combination 
of  cost  function,  input  and  network  we  present  the  results  corresponding  to 
the  three  versions  in  Tables  4.3,  4.6  and  4.9. 

Our  main  observation  from  the  results  of  Table  4.3,  4.6  and  4.9  as 
well  as  additional  experimentation  with  multicommodity  flow  problems  is 
that  in  the  early  iterations  the  1-step  method  makes  almost  as  much 
progress  as  the  other  two  more  sophisticated  methods  but  tends  to  slow  down 
considerably  after  reaching  the  vicinity  of  the  optimum.  Also  the  approxi¬ 
mate  Newton  method  does  almost  as  wet  I  as  Newton's  method  in  terms  of 
number  of  iterations.  However  the  computational  overhead  per  iteration  for 
Newton's  method  is  considerably  larger.  This  is  reflected  in  the  results 
which  show  in  ten  cases  out  of  twelve  a  larger  number  of  conjugate  gradient 
subiterations  for  Newton's  method.  Throughout  our  computational  experi¬ 
ments  the  approximate  Newton  method  based  on  conjugate  gradient  subitera¬ 
tions  has  performed  very  well  and  together  with  its  variations  is  in  our 
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view  the  most  powerful  class  of  methods  available  at  present  for  nonlinear 
multicommodity  network  flow  problems. 
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Figure  4.5:  Net  1 

initially  all  flows  traverse  link  21 
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1  Newton  1 

I 

3.7  37092  - 10 1 

14 

117 

J  Approximate  Newton  1 

1  1 

3.7  37745  - 10 1 

15 

30 

J  1-Step  1 

3.7  47400 • 10 1  / 

i 

15 

15 

1  Quadratic  Objective  1  9.7 59996- 106  f  1 

1  iti 

Newton  1  1 

1  i 

I 

1 .521299 • 101 1 
1 

5 

24 

Approximate  Newton  1  1 

1  l 

1 

1 . 521299  •  10 1  1 

i 

13 

27 

1-Step  1  1 

1  1 

1 .521301 • 10 1  1 

1 

16 

16 

1 

k 

Table  4.3:  Net  1,  Computational  Results. 
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Figure  4.6:  Net  2 

Initially  all  flows  traverse  the  paths  starting 
with  an  even  link  number. 
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~l - 1 - T 

1  Initial  1  Final  1 

1  Value  1  Value  1 

1  1  1 

- r 

#  of  1 
Itera-  1 
tions  1 

Total  #  of 
C-G  Subit¬ 
erations 

Low  Load 

1  I"  1 

1  I  I 

1  1  1 

1  1.399349* 102  1  1 

1  1  1 

« 

1 

i 

Nonquadratic  Objective 

1 

1 

Newton 

1  1  1.087936* 101  # 

|  |  1 

13  1 

1 

44 

Approximate  Newton 

1  1  1.087936* 101  1 

1  1  1 

14  1 

1 

34 

1-Step 

1  1  1.120913-101  \ 

i  1  1 

16  1 
l 

16 

Quadratic  Objective 

*  1 .461559-101  )  1 

1  1  t 

1 

1 

Newton 

1  1  9.561953  1 

1  1  1 

3  1 

l 

4 

Approximate  Newton 

1  '  9.561953  1 

1  1  1 

3  1 

1 

4 

1-Step 

1  1  9.561953  1 

1  1  1 

1  1  1 

6  1 

1 

1 

6 

High  Load 

Nonquadratic  Objective 

1  1  1 

1  1  1 

1  J  1 

1  3.863577.10s f  1 

1  1  1 

1 

1 

1 

1 

1 

Newton 

1  1  5. 767141. 101  f 

1  1  1 

12  1 

1 

31 

Approximate  Newton 

1  1  4.690345* 102  i 

1  1  1 

16  1 
| 

16 

1-Step 

1  1  4. 690345* 102  I 

1  1  1 

16  1 

1 

16 

Quadratic  Objective 

1  8.916975.10° 1  1 

1  I  i 

1 

i 

Newton 

1  1  5.967379  1 

1  1  i 

3  1 

1 

7 

Approximate  Newton 

1  1  5.967379  1 

1  1  1 

3  1 

I 

6 

1-Step 

1  I  5.967379  1 

1  1  1 

5  1 

1 

5 

Table  4.6:  Net  2.  Computational  Results. 
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Figure  4.7:  Net  3 

Permisable  paths  are  those  that  do  not  traverse 
the  "circles"  only  in  the  initial  or  final  leg. 
Initially  all  flows  are  directed  on  the  longer 
path  among  two  possible  for  each  OD  pair. 


13 


Cx  =  IB1  .  i  =|_|_J+  1,  j  =  1  -  7(1-1) 


Table  4.7:  Net  3.  Capacities 
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i - r 

1  Initial  1 

1  Value  1 

1  1 

Final 

Value 

T~ 

1  #  of 

1  Itera- 
1  tions 

- f 

Total  #  of  1 

C-G  Subit-  1 
erations  1 

ft 

1  i 

Low  Load  1  1 

I  t 

T - 

1 

i 

l 

1 

l 

■t 

Nonquadratic  Objective  1  2 .301192 • 10 1  f 

1 

1 

1 

1 

• 

Newton  1  1 

1  1 

1 .566526 • 10 1 

1  8 

1 

30 

1 

i 

Approximate  Newton  1 

1. 566526- 101 

f  10 

1 

24 

! 

1 

1-Step  1  * 

1 .568205 • 10 1 

f  16 

16 

1 

i 

I 

Quadratic  Objective  1  1 .86237 5 • 10 1  1 

1 

1 

1 

~t 

Newton  •  1 

1  t 

1.3 19755 • 10 1 

f  7 

13 

1 

I 

Approximate  Newton  1  1 

1.3 19755 - 10 1 

6 

6 

1 

1 

I 

ft 

1-Step  1  1 

1  1 

1  i 

1. 319755* 101 

6 

6 

1 

1 

1 

| 

1  I 

High  Load  1  1 

1 

1 

1 

i 

ft 

**  , 

Nonquadratic  Objective  1  1.1 66363 • 10 5 1 

1  1 

1 

l 

*.* 

Newton  1  1 

1  I 

1.1 13463 - 105 

10 

48 

1 

I 

Approximate  Newton  1  < 

1.113463- 105 

16 

54 

1 

1 

1 

• 

1-Step  1 

1.1 13478 • 105 

16 

i 

16 

1 

l 

- 

Quadratic  Objective  1  3 .077163 • 10 1 1 

I  l 

1  1 

i  i 

1 

1 

i 

Newton  1  1 

1  1 

2.1 35313 • 10 1  ! 

8  ! 

10 

1 

1 

ft 

Approximate  Newton  1  1 

1  1 

2.1 35313 • 10 1 

9  1 

9 

1 

1 

l 

'  .* 

1-Step  1  1 

i 

2.1 35313 • 10 1  | 

9  1 

9 

1 

1  i 

1 

• 

Table  4.9:  Net  3, 

Computational 

Results. 
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5.  Data  Flow-Control 


5 . 1  Introduction 

In  [12]  Gallager  and  Golestaani  Introduced  a  new  approach  to  data  flow 
control.  They  proposed  to  divide  the  task  of  flow-control,  in  the  sense  of 
deciding  whether  to  admit  a  new  packet  into  the  network  or  not,  into  two 
subtasks.  The  two  subtasks  have  complementary  roles,  one  is  concerned  with 
the  relatively  long-term  effect  of  admitted  packets,  while  the  other  is 
concerned  with  the  relatively  short-term  effect. 

The  first  subtask  is  quasi-stat  c  adjustment  of  the  average  rate  of 
session  flows  admitted  into  the  network  to  the  changing  environment  around 
the  network.  This  environment  is  changing  due  to  old  sessions  termination, 
new  sessions  activation,  and  changing  demands  imposed  on  the  network  by 
sessions  in  progress.  These  changes  occur  on  a  rather  large  time  scale,  in 
which  we  can  ignore  the  individual  packet  and  treat  the  streams  of  packets 
as  continuous  flows  which  are  not  accumulated  at  intermediate  nodes.  In 
this  context  we  adjust  the  average  of  the  admitted  flows  so  as  to  strike  a 
balance  between  the  desirability  of  increasing  session  rates,  and  the  need 
to  avoid  congestion  which  may  hamper  the  throughput. 

The  second  subtask  is  instantaneous  adjustment  of  packet  admittance  to 
the  instantaneous,  changing,  state  of  the  network.  This  state  is  changing 
due  to  possible  instantaneous  fluctuations  of  the  flow  admitted  to  the 
network  and  due  to  the  non-determini  Stic  process  which  governs  the  delivery 
of  a  packet  from  its  origin  to  its  destination.  These  changes  occur  on  a 
short  time  scale  in  which  we  deal  with  the  individual  packet.  In  this  con¬ 
text  we  adjust  the  instantaneous  session  rates.  At  times,  when  the  network 
is  momentarily  relatively  far  from  congestion,  we  allow  higher  rates,  while 
at  times,  when  the  network  approaches  congestion,  we  cut  the  rates  down. 
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Certainly,  ultimately,  it  is  the  second  subtask  which  will  determine 
the  average  rate  of  admitted  flows,  since  what  is  done  on  the  short  time 
scale  determines  what  happens  on  the  longer  one.  In  Section  5.2,  the  main 
thrust  of  this  chapter,  we  address  the  problem  of  tuning  the  mechanism  by 
which  the  second  subtask  is  performed,  so  as  to  achieve  the  averages  deter¬ 
mined  in  the  first  one. 

The  mechanism  we  will  consider  is  that  of  "window"  flow-control. 

Defined  in  a  broad  way,  window  flow-control  is  any  mechanism  which 
restricts  the  instantaneous  number  of  packets  in  the  network,  which  are 
associated  with  some  generic,  network  related  entities  (e . g.  links, 
sessions,  network),  from  exceeding  some  prespecified  values.  Thus,  for 
example,  the  isarithmic  flow-control  scheme  [21]  can  be  considered  as  a 
particular  implementation  of  a  window  associated  with  the  whole  network. 
Similarly,  flow-control  schemes  which  are  based  on  buffer  thresholds,  can 
be  considered  as  a  window  associated  with  a  link.  Certainly,  schemes  which 
are  based  on  the  "intersection"  of  several  windows  are  also  possible. 

in  its  narrower  and  commonly  used  meaning  [21],  window  flow-control 
refers  to  the  scheme  by  which  the  number  of  unacknowledged  packets  for  each 
session  is  restricted  from  exceeding  a  prespecified  value.  In  [36], 

Sal  lager  suggests  that  exercising  window  flow-control  by  windows  associated 
with  entities  which  are,  in  some  sense,  more  detailed  than  sessions, can 
provide  means  of  achieving  dynamic  routing.  In  particular  when  routing  is 
done,  as  in  this  Thesis,  by  the  use  of  path  flow  variables,  a  window  can 
be  associated  with  every  session-path  pair.  To  see  why  such  a  window  can 
provide  dynamic  routing,  we  draw  an  analogy  from  the  isarithmic  flow- 
control  scheme. 


A  serious  objection  to  the  use  of  isarithmic  flow-control  as  an 
implementation  of  a  global,  network  wide  window,  is  that  when  a  local 
congestion  arises,  the  scheme  reduces  also  the  input  rate  of  those  sessions 
which  do  not  affect  the  congested  area.  This  objection  applies  to  session 
windows  also.  A  session  that  uses  several  paths  should  not  be  cut  back  if 
only  one  of  these  paths  is  congested.  Rather,  the  uncongested  paths  should 
be  utilized  more  while  reducing  the  utilization  of  the  congested  path. 

This  can  be  more  closely  achieved  by  using  session-path  windows. 

In  Section  5.3  we  will  take  a  step  toward  demonstrating  that  session- 
path  windows  provide,  to  some  degree,  dynamic  routing.  In  the  next  section 
we  propose  an  iterative  algorithm  for  calculating  the  sizes  of  either 
session  windows  or  session-path  windows  which  give  rise  to  desired  session 
rates  or  path  rates,  respectively. 

5 . 2  An  Algorithm  for  Adjusting  Windows  to  Rates 

In  this  section  we  propose  an  algorithm  by  which  we  can  find  a  window 
for  each  session  that  results  in  a  set  of  prespecified  average  input  rates. 
The  result,  evidently,  applies  to  the  session-path  window  since  each 
session-path  pair  can  be  considered  as  a  session. 

Unlike  data  routing  in  which  we  assume  approximate  delay  functions  and 
use  their  first  and  second  derivatives,  windows  and  input  rates  are  bound 
together  through  the  real  delay  functions.  One  should  feel  uneasy  in 
assuming  these  functions  are  known  exactly,  let  alone  take  their  first  or 
second  derivatives.  Below  we  propose  an  algorithm  which  will  adjust  win¬ 
dows  to  achieve  prespecified  rates  without  the  exact  knowledge  of  the  delay 
functions. 
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Let  S  be  the  set  of  sessions  and  assume  a  certain  fixed  routing  is  given. 
Evidently ,  under  fixed  routing,  the  average  flow  on  each  link  a,  out  of  the 
set  of  links  £,  is  a  function  of  the  vector  of  average  session  inputs 
r^( r 1  ,r2 .  Let  D  (r)  denote  the  average  time  that  elapses  from  the 
moment  session  s  launches  a  packet  into  the  network  until  an  acknowledge¬ 
ment  of  its  receipt  arrives  back.  Let  v  =  (...,  vs,...)'  denote  the  vector 
of  windows  where,  vs  is  the  window  allocated  to  session  s.  Assuming  that  a 
packet  is  always  available  to  be  sent,  the  relation  between  rs  and  vs  using 


Little's  formula  [39],  satisfies 

vS  =  rS*Ds(r) 

Let  ft  be  defined  by 

ft  =  {r|-*S  >  0,  Ds(r)  < 

and  denote  the  mapping  defined  by  (5.2.1) 
[35],  [12]  that  under  various  assumptions 

Ds(r)  >  8  >  o 

for  some  B  >  0, 

3D  (r)  3D  (r) 

s  s _ 

s  >  s‘ 
ar  3r 

the  Jacobian  of  h,  denoted  by  P(r)  satisf 

P(r)  =  T(r)  +  R(r)B 1 M(r 


V  seS  .  (5.2.1) 

> 

00  7  SeS) 

r 

by  h :  R 1 5  I  ■+•  R I  $  I  .  It  turns  out 
includinn  the  assumption  that 

I 

7  reft,  7  seS  (5.2.2a) 

Vs,  seS  ,  (5.2.2b) 

es 

)B  (5.2.3) 


where  T(r)  denotes  a  diagonal  matrix  whose  (m,m)  entry  is  D^r),  R(r)  deno¬ 
tes  a  diagonal  matrix  whose  (m,m)  entry  is  H11,  M(r)  is  a  diagonal  matrix 
and  B  is  a  matrix  which  is  function  of  the  routing  only.  It  is  shown 
there  [35]  that  P(r)  is  nonsingular  on  ft  and  that  the  mapping  h:ft*h(ft)  is 
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one  to  one  and  therefore  h-1:h(  exists  and  by  the  Inverse  Function 
Theorem  [39],  is  continuously  differentiable.  We  add  here  the  following 
proposition  which  will  proved  in  Appendix  (5. A). 

Proposition  (5. A):  Given  that  h:S>h(ft)  is  such  that  P(r)  is  non¬ 
singular  on  ft,  Ds(o)<0°  and  D  (•)  is  convex  continuous  and  differentiable  on 
ft  for  all  seS,  then  under  assumption  (5.2.2)  h (fi)  =  (R  ')  ,  where 

(R|S,)+  =  {r|rs  >  0  V-seS}. 


With  Propositon  (5. A)  and  the  existence  of  h"1  which  is  continuously 
differentiable,  we  can  try  to  minimize  a  function  J  (h  ( v ) )  on  the 
nonnegative  orthant. 

Since  h,  as  mentioned  above,  is  not  available,  one  might  hope  that 

★ 

A vs  =  -  t  seS 

3rS 


is  a  descent  direction.  This  is  false  in  some  cases  as  the  following 
example  shows,  (cf.  [38]  page  67). 

Consider  a  network  of  one  link  (1,2)  and  two  sessions  originating  at 
node  1  and  terminating  at  node  2,  as  in  Figure  (A.l) 


D( r1  +  r^) 


2 

4 


12  12  12 

Assume  D^(r  +r  )  =  D^fr+r  i  =  D(r  +r  )  where  D  is  of  the  Kleinrock  type, 

D(rX+r2)  =  - Uy 

C-r  -r£ 

and  C  is  the  capacity  of  link  (1,2).  If  r*  is  very  close  to  the  capacity 

2  12 
and  r  is  zero  then  increasing  both  windows  v  and  v  will  virtually  not 

1  2 

increase  the  total  flow  r  +r  .  It  will  just  change  the  components  of  the 
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I 


flow  to  satisfy 

£  £ 

r2  =  v2 

Thus  r2  will  increase  at  the  expense  of  an  almost  identical  decrease  in  r1, 
As  a  result  if 


we  get  that 


3J*  3J* 

1  <  ?  <  0 
3r  3r 


implying 


•Ar^  =  Ar2  =  Ar  >  0 


aJ*  *  Ar(i^  -  i^L)  >  0 
3r2  sr1 


Gal  lager  [12]  showed  that  a  diagonal  scaling  of  vrJ*  by  R(r)  results  in 
a  descent  direction  in  the  window  space.  To  see  this  let 


av  =  -nR(r)vrJ  . 


It  is  known  that  v  J*  and  v  J*(h_1M)  can  be  connected  through 

r  v 

V*(r>  *  P‘  Ch"l{v))  •  7vJ*(h‘1(y))ly  .  h(r). 
Using  (5.2.3)  and  the  equations  above  we  get 


(5.2.4) 


vvJ*(h"1(v))'*Av  =  -nVvJ*(h"1(v))'k(r)vrJ*(r)= 


(5.2.5) 


=  -nVvJ*(h_1(v))  •R(r)P[h-1(v)],v  J*(h_1(v))= 

=  -nVvJ*(h_1(v)) ' [R(r)T(r)  +  R(r)B'M(r)BR(r) ]  VvJ*(h_1(v)) 


>  0 
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Similarly,  when  VvJ*(h"l(v))  is  available  taking 

a r  =  -nR(r)v  J*(h-1( v ) ) 
results  in  v 

V  J  •  Ar  *  -nV  J  *R(r) • V  J  (h  ^ ( v ) )  (5.2.6) 

r  r  v 

=  -n(P(r)'vv(J*-h“1)),R(r)‘Vv(J*.h"1) 

*  -1  *  -1 
=  -nVv(J*h  ) *P(r)R(r) •  vv( J  *h  ) 

*  i  *  i 

=  -n7v(J*h’V[T(r)R(r)  +  R( r ) B ' M( r ) BR( r )  ]  vy(  J -h"1) 

>  0. 

The  last  inequalities  in  (5.2.5)  and  (5.2.6)  hold  because  P(r)R(r)  is 
nonnegative  definite  on  ft  0 ( R < S I )+ .  Moreover,  the  inequalities  are  strict 
on  ft  0  [ i nt( R 1 s 1 )  j,  since  there  P(r)R(r)  is  positive  definite. 

Let  7>o  be  a  desirable  vector  of  average  input  rates.  Take  J*  to  be 

J  (r)  =  ir  -  m  . 

If  r0  is  such  that  the  ball  of  radius  u"F-ro u  does  not  intersect  the  axes, 
then  within  this  ball  the  eigenvalues  of  P(r)R(r)  are  bounded  away  from 
zero,  and  since  we  have  a  descent  direction,  then  by  taking  fixed  small 
enough  n  or  by  using  an  Armigo  stepsize  rule  (cf.  Chapter  4)  in  iteration 
(5.2.4)  we  are  guaranteed  to  converge  to  a  7  such  that  7=h-1(7). 

If  an  entry  of  7  equals  zero,  we  take  the  corresponding"^  to  be  zero. 
It  is  not  difficult  to  realize  that  we  are  now  back  at  the  previous  case 
but  in  a  space  of  smaller  dimension.  Similar  arguments  hold  to  achieve  7 
such  that  7=h(r) . 

In  practice  to  achieve  a  desired  7  we  use  an  approximation  to  D  (r)  to 
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find  a  point  v0  close  to  v,  and  then  we  iterate,  using  small  n.  Notice 
that  such  an  iteration  requires  only  the  monitoring  of  the  average  rate  of 
the  input  traffic. 

When  we  want  to  have  a  window  per  session-path  then  to  achieve  a 
desired  path  flow,  we  treat  each  session-path  as  a  separate  session.  If  a 
session  utilizes  several  paths  and  a  packet  arrives  when  some  of  the  paths 
are  ready  to  accept  it,  we  put  the  packet  on  a  particular  path  with  prob¬ 
ability  which  is  proportional  to  its  desired  path  flow. 

5.3  Dynamic  Routing  by  Session-Path  Window 

In  this  section  we  are  going  to  look  at  flow  control  from  the  strict 
point  of  view  of  whether  it  can  provide  means  of  exercising  dynamic 
routing.  To  this  end  we  will  assume  a  network  with  infinite  buffers.  The 
question  we  address  is  whether  in  such  a  network  a  window  flow  control 
mechanism  has  any  possibility  of  reducing  delay. 

Traditionally,  packet  delay  is  measured  as  the  time  that  elapses  from 
the  moment  a  packet  is  launched  into  the  network  to  the  moment  that  it 
arrives  at  the  destination.  With  flow  control,  packets  may  have  to  wait  at 
the  host  node  until  they  can  be  laurched  into  the  network.  Evidently  this 
will  not  happen  without  flow  control.  To  compare  the  two  cases,  we  assume 
a  given  packet  arrival  rate  to  a  host.  The  host  has  an  infinite  buffer 
where  it  stores  the  arriving  packets  until  they  can  be  transmitted.  We 
will  measure  delay  as  the  time  that  elapses  from  the  moment  a  packet  arri¬ 
ves  at  the  host  until  it  is  delivered  to  the  destination. 

Consider  the  network  in  Figure  (5.2), 
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Fig.  (5.2) 

We  model  this  network  as  a  buffer  feeding  two  networks  of  queues,  each 
network  consisting  of  four  tandem  queues.  The  queues  capture  the  inter¬ 
fering  traffic  which  is  omitted  in  Fig  (5.2).  We  assume  all  queues  are 
identical . 

2  3  4  5 

-] - * - ] - - - ] - 

path  1 

3'  4'  5' 

path  2 

Fig.  (5.3) 

Modeling  node  1,  just  as  a  buffer,  amounts  to  the  assumption  that  the 
transmission  delay  on  links  (1,2)  and  (1,2')  is  negligible  when  compared  to 
each  path  delay. 

With  no  flow-control,  packets  which  arrive  at  buffer  1  are  immediately 
sent  to  queue  2  or  2'.  We  assume  this  is  done  in  a  Round  Robin  fashion.  We 
compare  this  v,!th  the  flow  controlled  case,  in  which  the  total  number  of 
packet  at  queues  2  to  5,  corresponding  to  path  1,  as  well  as  the  number  of 
packets  at  queues  2'  to  5',  corresponding  to  path  2,  ire  restricted  from 
exceeding  some  value  v.  A  pith  will  be  said  to  be  blocked  when  the  number 
of  packets  on  it  equals  v.  We  assume  that  when  no  path  is  blocked,  packets 
are  sent  to  the  two  paths  in  a  Round  Robin  fashion.  When  the  two  paths  are 
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blocked,  packets  are  queued  at  buffer  1,  and  when  only  one  rath  is  blocked, 
packets  are  sent  to  the  path  which  is  not  blocked.  Let  d  be  the  average 
packet  delay  encountered  in  the  uncontrolled  case.  We  conjecture  that  the 
behaviour  of  the  average  delay  per  packet  d  as  v  changes,  is  as  shown  in 
Figure  (5.4). 


Fig{5.4) 

This  behaviour  can  be  explained  as  follows.  For  small  v  there  is  a  high 

probability  that  when  the  two  paths  are  blocked,  queues  2  and  2'  are  empty. 

Thus  by  not  sending  a  packet,  waiting  at  buffer  1,  we  do  not  perform  work  on 

a  "customer"  when  otherwise  we  could,  and  the  system  might  be  unstable. 

Once  we  take  v  above  some  threshold  v  the  system  becomes  stable.  Then, 

th 

when  only  one  path  is  blocked  we  gain  the  information  that  the  path  is 
"congested"  and  flow  is  diverted  to  the  other  path  until  the  "congestion" 
subsides.  The  ratio  of  the  probability  that  ex?'  ly  one  path  is  blocked  to 
the  probability  that  the  two  paths  are  blocked  simultaneously  and  queues  2 
and  2'  are  empty,  grows  to  infinity  as  v  increases  and  therefore  the  rela¬ 
tive  frequency  of  the  "good  effect"  by  far  exceeds  that  of  the  "bad 
effect".  This  justifies  the  reduced  delay  compared  to  the  uncontrolled 
case. 
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This  model  also  gives  an  indication  about  how  to  distribute  the  windows 
when  exercising  session-link  windows.  Certainly  a  path  with  all  packets 
concentrated  at  the  first  queue  is  more  congested  than  a  path  with  the 
same  number  of  packets  all  concentrated  at  the  last  queu.  Intuitively, 
since  we  try  to  get  information  so  as  to  equalize  the  average  amount  of 
work  remaining  in  each  queue,  we  should  have  the  windows  at  queues  2  and  2' 
much  smaller  than  those  at  queues  5  and  5'.  In  the  context  of  many  OD 
pairs  this  will  have  the  effect  of  giving  "priority"  to  flow  which  has 
already  undergone  large  delay  compared  to  a  flow  which  has  just  entered  the 
network . 

To  summarize,  we  have  shewn  in  this  section  that  the  claim  in  [36]  may 
be  plausible.  Analyzing  this  claim  in  more  detail  is  a  topic  for  further 
investigation. 
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6.  Integrated  Network 
6 . 1  Introduction 

We  consider  three  types  of  networks.  The  first  handles  data  only, 
the  second  handles  voice  only,  and  the  third  handles  voice  and  data 
together.  Correspondingly  we  have  three  sections.  In  the  first  we  propose 
the  implementation  of  the  data  routing  and  the  data  flow  control  algorithms 
of  the  previous  chapters  so  as  to  achieve  a  reasonable  objective.  In  the 
second  we  do  the  same  for  voice,  and  in  the  third  we  propose  a  way  of 
operating  the  four  algorithms  in  a  single  network. 

For  a  data  only  network,  the  joint  routing  and  flow-control  algorithm 
we  propose, follows  the  suggestion  made  by  Golestaani  in  [35].  The  only 
difference  is  our  use  of  the  particular  routing  algorithm  introduced  in 
Chapter  4,  and  the  adjustment  of  windows  by  the  Iterative  algorithm, 
suggested  in  Chapter  5. 

For  a  voice  only  network,  the  joint  routing  and  flow-control  algorithm 
is  designed  to  result  in  a  fair  allocation  in  the  sense  defined  in  Chapter 
3,  of  rates  to  sessions.  Here  like  in  Chapter  2,  we  assume  each  session  is 
routed  on  a  virtual  circuit  which  lasts  for  the  whole  duration  of  the 
session,  and  the  routing  decision  is  the  allocation  of  a  path  to  an 
incoming  session  requesting  one.  We  use  an  objective  function  whose  mini¬ 
mization,  similar  to  a  penalty  function,  results  in  a  fair  allocation  in  an 
asymptotic  sense. 

Finally,  for  a  network  which  handles  voice  and  data  simultaneously,  we 
propose  the  concurrent  use  of  the  previous  routing  and  flow  control  algor¬ 
ithms.  To  get  them  to  interact  in  a  meaningful  way  we  let  the  voice  take 
priority  over  the  data  within  the  queues  and  we  let  the  data  make  use  of 
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the  capacity  left  over  by  the  voice.  The  rates  allocated  to  voice  sessions 
are  determined  according  to  artificially  calculated  link  capacities.  These 
artifical  link  capacities  are  calculated  so  as  to  strike  a  balance  between 
the  maximal  rate  assigned  to  a  voice  session  on  a  particular  link  and  the 
delay  the  data  experiences  on  the  same  link.  We  conclude  with  a  discussion 
of  various  features  of  this  algorithm. 


6 . 2  Combined  Data  Routing  and  Flow-Control 

In  [35],  Golestaani  proposed  adjusting  routing  and  inputs  so  as  to 

strike  a  balance  between  a  penalty  for  reducing  inputs  below  the  values 

demanded  by  users,  and  a  penalty  for  increasing  the  network  congestion. 

Specifically,  letting  r*  represent  the  value  of  the  average  input  demanded 

w 

between  origin  and  destination  (00)  pair  weW,  Golestaani  suggested 
replacing  problem  (4.7.5)  by  the  problem 


Min  D( Ex)  +  l  e  (r  ) 
weW  w  w 


subject  to 


l  *  : 

PePw 

xP  >  0 
0  <  rw 


r 


w 


* 

<  rw 


(6.2.1) 

T  weW, 

(6.2.2) 

T  pePw,  weW, 

(6.2.3) 

T  weW, 

(6.2.4) 

where  all  notations  carry  over  from  problem  (4.7.5),  and  e  (•)  is  a  nonneg- 

w 

ative  nonincreasing  convex  function  on  the  interval  [0,ru]  for  all  weW. 

He  then  proposed  to  rewrite  (6.2.2)  -  (6.2.4)  as 


l  xP  +  y 
peP  w 


w 


* 

r 

w 


*t  weW  (6.2.5) 
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(6.2.6) 


X*5  >  0  y  >  0  Y  peP  ,  weW 

w  w 

and  correspondingly  replace  e  (r  )  by  g  (y  )  for  all  weW  where 

w  w  w  w 

★ 

-  ew(rw  -  yw ) .  (6.2.7) 

It  can  then  be  easily  seen  that  the  problem 

Min  D(Ex)  +  l  gJyJ  (6.2.8) 

weW^  w 

subject  to  (6.2.5)  -  (6.2.6)  is  equivalent  to  problem  (6.2.1)  -  (6.2.4) 
through  the  transformation 

xp  =  x13  T  p£Pw>  weW, 

* 

yw  =  rw  -  rw  T  weW» 

Problem  (6.2.8)  can  now  be  viewed  as  a  routing  problem  with  separable 
objective  function  to  which  the  algorithm  of  Chapter  4  applies.  Morever  in 
problem  (6.2.8)  we  have  the  same  00  pair  set  as  is  problem  (4.7.5);  therefore 
it  is  easily  seen  that  the  process  of  solving  (6.2.8)  can  be  simulated  on 
the  network  which  corresponds  to  problem  (4.7.5).  This  simulation  process, 
when  done  using  the  algorithm  of  Chapter  4  can  be  seen  to  preserve  all  the 
decomposition  and  distributed  computation  properties  mentioned  there. 

Thus  the  algorithm  will  result  in  a  sequence  of  inputs  and  path  flows  con¬ 
verging  superli nearly  to  the  solution  of  problem  (6.2.1). 

The  fact  of  the  matter  is  that  our  algorithm  is  not  intended  to  solve 
problem  (6.2.8)  in  the  abstract  but  rather  to  serve  as  a  quasi-static 
algorithm  in  a  network  in  which  we  deal  with  flows  and  inputs  consisting  of 
actual  packets  sent  by  users.  In  this  context  it  is  the  flow-control  we 
exert  on  inputs  that  will  determine  the  average  input  rates.  Thus  we  have 
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to  adjust  our  flow-control  parameters  from  iteration  to  iteration  so  as  to 
achieve  the  average  input  rates  dictated  when  solving  (6.2.8). 

When  exerting  the  flow-control  on  the  inputs  by  windows,  either  for 
each  session  or  for  each  session-path  pair,  we  can  adjust  the  windows  to 
achieve  a  desired  input  rates  using  the  algorithm  suggested  in  Chapter  5. 
Thus  when  a  window  per  session  is  employed,  we  first  adjust  the  routing 
fractions  of  the  outgoing  paths  and  then  iterate  on  the  session  windows  to 
achieve  the  desired  average  rate;  when  a  window  per  each  session-path  pair 
is  employed,  we  determine  the  new  desired  path  flows  and  then  iterate  on  the 
session  path  windows  to  achieve  these  path  flows. 

In  order  that  the  window  iteration  will  not  follow  the  congestion 
sample  path  and  by  that  exacerbate  a  dangerous  situation,  the  time  between 
two  window  iterations  should  not  be  too  short.  On  the  other  hand,  this  time 
should  not  be  too  long  in  order  that  we  execute  more  window  iterations  in 
between  two  routing  updates.  A  reasonable  time  in  between  two  window 
iterations  is  that  which  is  comparable  to  the  time  it  takes  a  temporary 
congestion  to  subside.  Correspondingly,  a  reasonable  time  in  between  two 
routing  updates  is  one  which  allows  for  a  number  of  window  iterations  which 
results  in  inputs  which  are  acceptable  approximations  to  the  desired  ones. 
Notice  that  the  time  between  routing  updates  should  not  be  too  short 
anyway,  since  in  the  quasi -static  approach  we  should  measure  flow  on  a  time 
scale  on  which  the  accumulation  of  flows  at  intermediate  nodes  can  be 
neglected, 

6.3  Combined  Voice  Routing  and  Flow-Control 

In  chapter  3  we  have  introduced  a  voice  flow-control  algorithm  which 
allocates  rates  to  sessions  so  as  to  yield  a  fair  allocation,  in  a  sense 
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explained  there,  over  the  set  defined  by  (3.3.6)  -  (3.3.8).  When,  in  addi¬ 
tion,  we  have  the  freedom  to  determine  routing  also,  some  of  the  parameters 
in  (3.3.6)  -  (3.3.8)  are  not  fixed  anymore,  but  are  to  be  determined  by  the 
routing.  Thus  the  set  defined  now  by  (3.3.6)  -  (3.3.8)  is  larger  than 
oefore.  A  natural  objective  for  a  joint  routing  and  flow  control  algorithm 
will  be  to  find  a  fair  allocation  of  rates  over  this  larger  set. 

Specifically,  retaining  the  notations  used  in  Chapter  3,  every  session 
seS  is  associated  with  an  origin-destination  (OD)  pair  weW.  Abusing  nota¬ 
tion,  we  let  w  denote  also  the  set  of  all  seS  which  are  asssociated  with 
weW,  then  the  objective  of  the  joint  voice  flow  control  and  routing  is  to 
find  a  fair  allocation  (...,f  ( ys ),...)'  over  the  set  of  vectors 

( . . . ,f_1 (ys) , . . , ) '  such  that 

zp  *  *s  )  *  9a  (ca  "  F  Wp  (a)*zp  Yp  ePw:  sew 

s  s  s  ^s  (6.3.1) 

Y  seS,  Y  asf 


l  zp  =  1  Zp  £{0,1} 

s  s 

PsePw:  sew 

Fa  <;  c  ,  ys  >  0  Fa  =  £  yS.]  (a).z 
a  seS  Ps  Ps 


Vp  ePw:  sew,  YseS,  Y  weW 
s 

(6.3.2) 

Yp  sP  :  sew,  YseS,  YweW 

Ks  w 


(6.3.3) 

From  relation  (6.3.1)  it  is  not  difficult  to  see  that  there  exists  ni 
such  that  for  all  m  >  m  minimizing 


l  [9a(c  -  Fa)  j-m 


a£f 


(6.3.4) 


subject  to  (6.3.2),  (6.3.3)  and  to 

zp  *  Ip  (a)'Ll(yS)  =  min  g  (c  -  Fa).l  (a)z  Y  seS,  ac£, 
s  Ps  S  a : lp(a )=1  a  a  Ps  Ps 


» 


» 


» 
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(6.3.5) 


will  solve  the  new  fair  allocation  problem.  A  similar  observation  was  made 
by  Gal  lager  in  a  different  context  [26]. 

We  have  already  seen  in  Chapter  2  that  by  routing  new  incoming 
sessions  on  the  minimal  marginal  delay  path,  we  obtain  under  various  assump¬ 
tions  detailed  there,  almost  the  same  effect  as  if  we  had  the  possibility 
of  rerouting  virtual  circuits.  Thus  our  rule  for  routing  will  be  to 
measure  c  -  Fa  periodically,  update  the  shortest  marginal  delay  path  with 

d 

respect  to  the  delay  function  (6.3.4)  using  some  fixed  large  m,  and  then, 
in  between  updates,  direct  all  incoming  new  sessions  to  the  recently 
chosen  paths. 

This  routing  algorithm  will  work  provided  that  the  c  -  Fa  we  measure 

d 

corresponds  to  a  fair  allocation  subject  to  the  current  routes  so  that 
(6.3.5)  holds.  This  will  be  approximately  true  if  we  employ  the  flow- 
control  algorithm  of  Chapter  3  and  we  assume  that  it  maintains  the  rates 
close  to  the  fair  allocation  one  at  all  times,  as  a  good  quasi-static 
algorithm  should  do. 

6.4  Combined  Voice  and  Data 

We  consider  a  network  serving  both  voice  and  data  together.  Because 
of  the  hard  constraints  on  voice  delay,  we  asssume  that  the  voice  packets 
are  given  priority  in  the  queues  over  the  data  packets.  Thus  as  far  as  the 
voice  is  concerned,  the  data  does  not  exist.  This  may  cause  the  data  to  oe 
squeezed  out  of  the  network  by  the  voice.  In  an  extreme  case  the  voice 
might  obtain  an  encoding  rate  above  ear's  resolution  at  the  expense  of  the 
data.  To  prevent  this  from  happening  we  impose  ar  additional  constraint  to 
(6.3.1)  -  (6.3.3)  and  assign  the  voice  rates  so  as  to  achieve  a  fair 
allocation  over  the  resulting  smaller  set. 
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Let  °a(f0»  Fa)  denote  an  approximation  to  the  delay  experienced  by  the 
data  on  link  aeL  when  the  average  voice  traffic  on  link  ae£  is  Fa  and  the 
average  data  traffic  on  that  link  is  fa.  A  reasonable  compromise  between 
data  delay  and  voice  rate  can  be  achieved  by  imposing  the  condition 

e( f"1 (ys) )  *1  (a)  >  D  (fa,Fa).lD(a)  ae£,  Y  seS  (6.4.1) 

s  p$  a 

where  e(»)  is  a  nonnegative  monotonically  decreasing  function,  approaching 
infinity  as  its  argument  reduces  to  zero.  Since  f^1  is  monotonically  non¬ 
decreasing,  (6.4.1)  bounds  ys  on  linh  aeL  as  a  function  of  the  delay 
experienced  by  the  data  on  that  link.  We  now  propose  a  joint  flow-control 
and  routing  algorithm  for  voice  and  data  which  will  satisfy  (6.4.1).  For 
voice,  the  objective  of  the  joint  routing  and  flow-control  algorithm  will 
be  to  achieve  a  fair  allocation  of  rates  over  the  set  defined  by  (6.3.1)  - 
(6.3.3)  together  with  (6.4.1).  For  data,  the  objecti/e  will  be  to  minimize 
(6.2.8)  where  Da(fa,Faj  replaces  Da ( f 3 )  in  (4.7.5). 

To  achieve  these  two  interacting  objectives,  we  propose  to  employ  the 
joint  routing  and  flow-control  algorithms  of  the  previous  data  and  voice 
sections,  for  data  and  voice,  respectively.  The  voice  rate  will  be 

influenced  by  the  data  delay  by  viewing  c  in  condition  (6.3.1),  (6.3.3)  as 

a 

a  parameter  c  .  From  (6.3.1)  and  (6.4.1),  at  equilibrium  we  have 
a 

Da(fa,Fa)  <  e(ga[ca  -  Fj)  T  ae£.  (6.4.2) 

Using  Assumption  (3. A)  on  g  we  observe  that  the  composition  of  g  with  e 

«  a 

can,  without  loss  of  generality,  be  denoted  by  e  again  since  it  satisfies 

a 

the  properties  attributed  to  e,  above.  We  propose  substituting  different 
ca  from  iteration  to  iteration  according  to  the  algorithm 


(ca)k+i  *  »axtO,*1n(ca,(c~a)k  +  \[ea((c~)k  -  F»)  -  Oa(f®  ,F«)  ])}  {6.4.3) 


Y  ae£,  k=0,l . 

~a  ~a 

for  some  0  <  <  1.  The  determination  of  , as  well  as  the  exact  model 

and  assumptions  under  which  the  resulting  combined  voice  and  data  algorithm 

will  converge,  is  a  subject  for  further  research. 

On  the  intuitive  level,  the  working  of  the  algorithm  should  be 

apparent.  Assume  we  are  currently  at  equilibirum.  If  the  penalty  e  (r  ) 

w  w 

for  an  OD  data  pair  weK  is  increased  by  a  small  increment, the  flow  on  the 

paths  utilized  by  w  will  increase  accordingly.  3y  the  data  routing 

algorithm  this  increase  will  be  distributed  among  the  paths  as  to  equalize 

the  marginal  data  path  delays.  By  iteration  (6.4.3)  the  increase  in  the 

data  delay  will  cause  a  reduction  of  c  .  This  by  the  fair  allocation 

a 

algorithm  for  voice,  will  result  in  a  decrease  in  Fa.  Morever  the  total 
change  in  c  -  F  will  be  negative  by  Assumptions  (3. A),  (3.B)  and  the  fair 

a 

allocation  property.  Thus  we  obtain  a  new  equilibirum  in  which  the  data 
delay  was  increased  by  an  increment  and  the  voice  rate  was  decreased  by  an 
increment.  Moreover,  when  the  number  of  voice  sessions  on  a  particular 
link  increases,  it  causes  an  increase  in  Fa  and  a  new  equilibirum  is 
obtained  at  a  higher  Fa,  c  and  higher  data  delay.  The  higher  data  delay 
on  the  other  hand  causes  a  data  input  reduction. 

It  can  be  observed  that  the  algorithm  maintains  the  distributed  nature 
similar  to  the  the  other  algorithms  in  this  Thesis  since  iteration  (6.4.3) 
can  be  implemented  by  each  link  using  only  variables  local  to  the  link. 
Moreover  the  algorithm,  by  dynamically  allocating  the  capacity,  trades  off 
between  voice  and  data  where  this  tradeoff  takes  into  consideration  the 
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priority  functions  of  the  various  data  and  voice  sessions.  The  reader  can 
easily  convince  himself  of  this  by  examining  the  process  described  in  the 
previous  paragraph  using  different  priority  functions. 
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Appendix  (2. A) 


Proof  of  Lemma  (2. A) : 

For  all  weW  let  peP  denote  the  path  such  that  UP(kAt)=l,  then  if  x  is 

w 

such  that  for  all  weW,  xP  peP  satisfies 

w 

XP  =  I  V*1 

0  peP  ,  p*p  , 

w 

it  solves  the  problem  in  the  lemma  since  for  all  weW  there  exist  constants 
qw  such  that  for  all  peP 

w 

„  D(Ix(kAt) )  V( x( k At) )  w 
3  =  3  .  d  +  9  » 


i.e.  peP  satisfies  also 
w 

3V(x(k  At)  ) 
p  *  arg  min  - 

peP  3xP 

w 


Now,  it  can  be  seen  that  every  stochastic  process  in  (2.2.3)  corresponds  to 

an  exponential  random  variable, and  thus  by  the  memoryless  property  <  the 

exponential  distribution  and  the  independence  of  these  exponential  random 

variables, we  get  for  peP  ,  p/p 

w 


E  (  xK'(  [k+1  ]  At )/x( k At) ^  = 


xP(kAt)*e 


(2.A.1) 


To  find  the  expression  for  p=p”  we  first  find  the  average  of  the  number  of 
arrivals  at  w  between  kAt  and  ( k + 1 ) At  which  do  not  terminate  before 
(k+1) At.  Denote  this  number  by  Z  ,  and  the  number  of  arrivals  by  ¥,  then 


E(7  )  =  E 
w  IT 


A  (k+1) At  -  A  (kAt)  =  W) 
w  w 


(2. A. 2) 
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■  e/V  •  1  =  e  (V  •  1  *  (1  -  e~'i£A)S) 

1f\  0  1  TT  \  viffET  ) 


1.  -  u  At  -1  -uAt 

VAt  TTa^1  -  e  )  =  Vn  (1  -  e  ) 


where  the  third  equality  follows  because,  given  the  number  of  Poisson 
arrivals  in  an  interval,  then  the  arrival  times  are  uni  formally  distributed 
in  the  interval.  Thus  for  p  =  p"  we  have 


xP( [k+1 J  At) /x( k At) 


=  xP(k  At)  *e'"^At  +  X  e~yAt). 


w 


(2. A. 3) 


From  (2.A.1)  -  (2. A. 3)  we  get  for  p*p 


E(xP[(k  +  I)  At  J  -  XP(kAt)/x(kAt)) 


while  for  p=p 


-  (e_l,4t  -  l)xP(kAt)  •  (1  -  e~wAt> ( x**  -  xP(kit)) 


xP[(k+l)4t]  -  xP(kAt)/x(kAt) 


Uwu_1  -  x^kitUtl  -  e-“4t)  =  (1  -  e'1,4t)(xP  -  xp(k4t)) 

Q.E.D. 


Proof  of  Lemma  (2.B) : 

For  peP  p^p",  using  the  fact  that  the  variance  of  a  Bernoulli  random 
w 

variable  with  probability  of  success  p,  is  pq  we  get 


E 


(  xP  [  (k+l )  At  J  -  x(kAt)  )2/x(kAt) 


(2. A. 4) 


xp[(k+l)Atj  -  x(kAt)/x(kAt) 


+ 
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02{xP(  [k+1  ]  At)  “  x(kAt)/x(kAt) ) 


xP(kAt)(l  -  e_yAt)2  +  xP(kAt)e-nAt(l  -  e-uAt) 


while  for  peP  ,  p=p  we  get  with  Z  and  7T  as  defined  in  the  proof  of 
w  w 

Lemma  (2. A), 

E  jV([k+l]At)  -  x^(kAt) )Vx^(kAt) 


(2. 


E(Zw)  +  E 


( xp(  [k+1  ]  At  -  XP(kAt)  )2/x(kAt)  ,  Zw=0 


o 

For  E(Z  )  we  have 
w 


e<4>  •  %  -  % 


1  -  y  At  2 

(TT-^d  -  e  )) 


TT-Ul  -  e'^fl 

U  At  \ 


1  -  e 


-yAt 


yAt 


2  (  ]_  At  \  2 

*  L xwAt  +  (xwAt)  ](  yAf  (1  “  e  U  )  )  + 


1  _  e"yAt  (  1 

+  xwAt<i“ll!t  >  (  1  -  - 


-  e~yAt 
yAt 


For  the  second  term  in  the  right  hand  side  of  (2. A. 5)  we  have  the  same 
expression  as  (2. A. 4)  and  Lemma  (2.B)  follows. 


Q.E.D. 


A. 5) 
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Appendix  (3. A) 

Proof  of  Lemma  (3. A) 

We  prove  (3.4.5)  inductively.  Assume  satifies  (3.4.5).  Then  it  can 
be  easily  verified  from  (3.4.3)  and  (3.4.4)  that 

0  <  a®  <  1.  (3.A.1) 


Also  by  (3.4.2) 


min  [(1  -  aa)ys  +  aaf  g  (c 
a  s.t.  1  (a)=l  k  k  k  s  a  a 
Ps 


-  I  tf1  «>))]• 

teS  k  Pt 


(3.A.2) 


Since  by  assumptions  (3. A),  the  induction  hypothesis  and  (3.A.1)  the  two 
terms  in  the  R.H.S.  of  (3. A. 2)  are  nonnegative  we  get 


Vl 


>  o. 


Let  t^SY^lpt^  be  den0ted  by  Fk  and  1et  I  ft9atca  "  ( * ) ] !p  (a) 

be  denoted  by  G,(  ).  Then  G  (  )  is  convex  decreasing  and  (3.4.3)  and 
a  a 

(3.4.4)  can  be  written  as 


k  1  k  1 

aa  = - a -  or  aa  = - I — T  » 

(Ga(0)  -  Ga(Fk))  (1  -  Ga(Fk) ) 

1  + 


(3. A. 3) 


respectively. 

From  (3.4.2)  we  have 

Ff  <  Fa  +  akLG  (Ff )  -  Ff]  .  (3.A.4) 

k+1  k  a  a  k  k 

We  will  now  distinguish  between  two  cases: 


case  a:  (G  (F.a)  <  F?) 
-  a  k  k 
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In  this  case  it  fellows  from  (3. A. 4)  and  (3.A.1)  that 


Fa  <  Fa 
rk+l  k 


and  therefore  by  the  induction  hypothesis 

Fa  <  c 
k+1  'a 


case  b:  (G  (F,a)  >  F?) 

d  K  K 


By  convexity  of  G  (•)  we  have  for  all  z<F? 

d  K 

.  Ga(z)  -  G_(F?) 

• /ra\  ^  a  a  K' 


*-Fk 


Taking  z=0  we  obtain 


G  (0)  -  G  (Fa) 
•G;(Fk>  <-A— - 


and  therefore  by  the  nonnegativity  of  G  (F?)  -  Fa  we  get  using  (3. A. 4) 

d  K  K 


and  the  fact  that  G  (  )  is  decreasing 


F,  .  <  Fa  + 


1 


(G  (Fa)  -  Fa) 
k+1  k  1-G‘  (FkP^  1  a  k  k 


or  by  the  nonnegativity  of  1-  G^F^) 


G  (F?)  -  Fa  +  (G’(Fa)  -  l)(Fa  -  Fa)  >  0.  (3. A. 5) 

d  K  K  d  K  K  ’  i  K 


Thus  (3. A. 5)  holds  for  a£  as  defined  by  (3.4.3)  and  (3.4,4).  Define 


H  ( Fa )  =  G  (Fa)  -  Fa. 
a  k  a  k  k 


Obviously  H(*)  is  convex  and  therefore  by  the  subdifferential  relation  and 
(3. A. 5)  we  obtain 


G  (Fa  )  -  Fa 
a  k+1  k+1 


H  (Fa  )  >  0 
a  k+1 
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and  as  a  result 


VFui> 


>  F 


k+1* 


(3. A. 6) 


Vl.l.o.g.  define  G  (z)  =  0  for  z  >  c  and  get  from  (3. A. 6) 

d  a 

F.a  <  c  , 
k+1  a’ 


and  (3.4.5)  is  proved. 

Once  we  have  proved  (3.4.5)  we  can  strengthen  '3.A.1)  by  observing  that 
now  (3.4.3)  and  (3.4.4)  imply 


lim  inf  of  >  0.  (3. A. 7) 

k-f«  * 


Notice  that  once  F®  belongs  to  case  b)  for  some  Y,  it  will  belong  to 
this  case  for  all  k>!T;  while  if  F*  belongs  to  case  a)  it  is  reduced  toward 
the  region  of  case  b)  as  Implied  by  (3. A. 7)  and  (3. A. 4)  at  least  as  fast  as 
a  geometric  progression  so  that  at  most  in  tne  limit  we  are  in  case  b)  and 

thus 

lim  sup  F£ 
k+o- 


satisfies  cise  b)  which  proves  the  lemma. 


Q.E.D. 
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Appendix  (4. A) 

Proof  of  Lemma  ( 4. A) :  a)  Fix  xeX,  zeH  and  y>l.  Denote 

a=x+z,  b=x+yz  (4.A.1) 

Let  7  and  7  be  the  projections  on  X  of  a  and  b  respectively.  It  will 
suffice  to  show  that 

■T>  -  x«  <  Yu7  -  xu.  (4. A. 2) 

If  7  ■  x  then  clearly  ¥  =  x  so  (4. A. 2)  holds.  Also  if  aeX  then  7  =  a  =  x  +  z 
so  (4. A. 2)  becomes  u¥  -  xu  <  yizu  =  Hb  -  x»  which  again  holds  since  P  is 
nonexpansive.  Finally  if  7  =  ¥  then  (4. A. 2)  also  holds.  Therefore  it  will 
suffice  to  show  (4. A. 2)  in  the  case  where  a  *  b,  a  t  x,  b  *  x,  aeX,  beX 
shown  in  Fi gure  (4.a.1) . 

Let  H  and  H  be  the  two  hyperplanes  that  are  orthogonal  to  (¥  -  7)  and 
a  b 

pass  through  7  and  ¥  respectively.  Since  <F  -  7,  b  -  ¥>  >  0  and  <¥  -  7, a  -  7> 
<  0  we  have  that  neither  a  nor  b  lie  strictly  between  the  two  hyperplanes  H 

a 

and  H  .  Furthermore  x  lies  on  the  same  side  of  H  as  a,  and  x£H  .  Denote 
b  a  a 

the  intersections  of  the  line  {x  +  a(b  -  x)<cteR}  with  H  and  H  by  s  and  s 

a  b  a  b 

respectively.  Denote  the  intersection  of  the  line  {x  +  a(7  -  x)|aeR)  with  H 

b 

by  w.  We  have 


Y 


ab  -  xu 
ua  -  xu 


usL  -  xu 
b 

USa  -  XII 
a 


uw  -  xu 
ua  -  xu 


uw  -  ai  Ha  •  xu 
u7  -  xu 


> 


u¥  -  an  +  ii a  -  xu 
u7  -  xu 


ii¥  -  xu 

>  — - 

ua  -  xu 


(4. A. 3) 
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where  the  third  equality  is  by  similarity  of  triangles,  the  next  to  last 
inequality  follows  from  the  orthogonality  relation  <w  -  F,F  -  a>  =  0,  and 
the  last  inequality  is  obtained  from  the  triangle  inequality.  From  (4. A. 3) 
we  Obtain  (4. A. 2)  which  was  to  be  proved, 
b)  Since  y  is  a  direction  of  recession  of  a,  we  have 


P^(x  +  z)  +  yefi  .  (4. A. 4) 

Thus  by  definition  of  projection  on  a  closed  convex  set 


<(x  +  z)  -  P^{x  +  z),(Pn(x  +  z)  +  y)  -  P^(x  +  z)>  <  0  (4. A. 5) 

or  equivalently 

< ( x  +  z)  -  P^(x  +  z),y>  <  0, 


and  (4.3.25)  follows. 


Q.E.D. 
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Appendix  (4.3) 


We  develop  the  main  arguments  for  the  proof  of  Proposition  (4.C)  through 
a  sequence  of  Lemmas.  In  what  follows  we  use  the  word  "eventually"  to  mean 
"there  exists  IT  such  that  for  all  k  >7",  where  7  may  be  different  for  each 
case. 

Lemma  (4.B.1):  Under  the  conditions  of  Proposition  (4.C),  lim  x  =  7  and 
.  k 

eventually 


I  =  A  .  (4.B.1) 

k  t 


Proof:  By  relation  (4.4.14),  since  x  is  a  limit  point  and  the  algorithm 
decreases  the  value  of  the  objective  function  at  each  iteration,  we  have 


lim  >Vl  -  y  ■  0, 


which  implies,  again  by  the  descent  property  and  the  fact  that  x  is  a 
strict  local  mi  mi  mum 

lim  x  =  7.  (4.B.2) 

k-*-®  k 


Therefore  from  (4.5.5) 


lim  e(x  )  =  efx)  =  0.  (4.B.3) 

k+«  k 


Since  the  set  I  is  finite  it  follows  from  (4.5.2),  (4.5.6)  and  (4.B.3)  that 
eventually 


I,CA  • 
k  t 


(4.B.4) 


To  show  the  reverse  inclusion  we  must  show  that  eventually 


<ai,xk>  >  bj  -  e(xk)  iia-j  ii  ,  T  ieA^.. 


(4.B.5) 
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By  the  Cauchy-Schwartz  Inequality  (B.3)  and  (4.5.5)  we  have  eventually 


EV"V  =  "Xk  '  P[\  '  vf(xk,]'*'a, '  >  <p[\  -  7f(*k>]  -  i^.a  >. 


Therefore  In  order  to  show  (4.B.5)  It  suffices  to  show  that  eventually 


<a1 »pl*k  -  5f(»k>]>  ■  bi , 


YU\ 


or  equivalently 

P[*  ’  Vf ( x  )  ] ex  +  N  . 

K  K  X 

Since  >  7  this  follows  from  Assumption  (4.C). 


Q.E.D, 


Lemma  (4.B.2):  Under  the  conditions  of  Proposition  (4.C)  for  each  ae(0,l], 
eventually  we  have 


x,  (a)ex  +  N  , 
k  t 


-V-  ae[a,l  ] . 


(4.B.6) 


Proof:  From  Lemma  (4.B.1)  we  have  x  *  x  and  eventually  C  =  TT  where 

k  k 

(f  =  { 2 *<2 , a-j >  <  0,  Y  1eA^}>  (4.B.7) 

Since  the  projection  of  -7f(T)  on  T  is  the  zero  vector  and  d^  is  eventually 
the  projection  of  -vf( xR )  on  Z  it  follows  that 


lim  dk  =  0, 
k-*-® 


(4.B.8) 


Since  3  is  the  projection  of  D  d  on  a  subset  of  C  ,  and  { iiD  »}  is  bounded 
*  n  k  k  k 

above  [cf.  (4.4.8),  (4.4.9)],  it  follows  that 


lim  cf  =  0. 

k+oo  k 


(4.B.9) 
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Since  -vf(xk)  =  d£  +  d^  and  g^  =  d£  +  cf^  we  obtain 


lim  g  =  Vf(x).  (4.B.10) 

k+®  K 

A  simple  argument  shows  that  Assumption  (4.B)  implies  that  for  all  cte[0,l] 

P(y)ex  +  N  Y  y  such  that  ifx  -  otVf(7)  -  yu  <  ct6  (4.B.11) 

For  any  cTe(0,l]j  equation  (4.B.10)  shows  that  we  have  eventually 

II X  -  aVfCx)  -  (x,  -  ag  )l  <  a6»  Y  aefa,l]. 

k  k 

Therefore  from  (4.B.11)  we  have  eventually 

X  (a)  =  P(x  -  ag  )ex  +  N  ,  Y  ae["a,l]. 

k  k  k  x 


Q.E.D 


Lemma  (4.B.3):  Under  the  conditions  of  Proposition  (4.C) 


lim  inf  a.  >  0. 
k+®  K 

Proof:  From  Lemma  (4.B.1)  we  have  1^  =  A^  and  x^  7,  while  from  (4.B.8)  we 
have  u gk u  +  n vf ( x”) u .  Therefore  from  Proposition  (4. A)  part  b)  [cf.  (4.3.27)] 
it  follows  that  there  exists  a  >  0  such  that  eventually 


<Vf(x,  ),x  -  x  ( a)>  >  a  <d  ,D  d  > 
k  k  k  k  k  k 

+  “  B  Xk  ( a)  -  (x|<  +  ad|<)l|2, 


T  ac(0,a]. 


Using  this  relation  we  get  that  eventually 
f(xk)  -  f[xk(a)]  > 
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>  <7f(Xk),X  - 

>  a  <djc,D|(d((> 


X(a)>  -  i  »x  (a)  -  X  II2 

7  k  k 

1  ~ 

+  ”iX|((a)  -  (Xk  +  adk)^ 

+  7»*k(a)  "  (Xk  +  adk)B 


L  2 

7«Xk(a)  -  Xk  0 


-LBa^ll2  -  L»Xk(a)  -  (xk  +  aC^t2 
>  a(l  -  aLX/j)<d.  ,0  d  >  + 

2  k  k  k 

1  ~  2 
+  ("a  "  L)  llXk(a)  -  (Xk  +  Otd^ )  ii 

where  the  second  inequality  follows  from 
t  x  +  yn2  <  2 u x ii 2  +  2 Dy M 2 


the  last  inequality  follows  from  (4.5.8)  and  L  corresponds  to  an  arbitrary 
nonempty  bounded  neighborhood  of  7.  Taking  any  a  >  0  satisfying 


_  A 
a  <  a, 


1  -  aLX 

2 


a,  aQ.  -  L)  >  a 
a 


we  obtain,  using  (4.5.14)  that 


1  im  inf  a.  >  a 

k+<»  K 

and  the  lemma  is  proved. 


Q.E.D. 


Proof  of  Proposition  (4.C);  The  fact  lim  x,  =  x  is  part  of  Lemma  (4.B.1), 

k+»  * 

while  (4.5.7)  follows  from  Lemmas  (4.B.2)  and  (4.B.3). 

In  order  to  show  (4.5.8)  we  note  that  from  Lemma  (4.B.1)  and  (4.B.8) 
we  have  eventually 


i 


\ 
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(4.B.12) 


ck=r, 


and 


Tim  d*  =  -vf{7). 
k+®  k 


(4.B.13) 


Equation  (4.B.13)  implies  that  eventually  Assumption  (4.B)  holds  with  d*  replac¬ 
ing  -vf(x)  and  6/2  replacing  6.  Therefore  for  all  ieA^  and  p^>0  such  that 
i|piai n  <  j  we  have 


P(x  +  d*  ±  p.a. )eT  +  N 
k  1  l  T 


(4.B.14) 


P(x  +  dk)ex  + 


(4.B.15) 


For  any  z1*z2eH  we  have  from  a  general  property  of  projection  on  X 
<z1  -  P(z1),P(z2)  -  P(z1)>  <  0 
<z2  -  P(z2),P{z1)  -  p ( z2 ) >  <  0. 

By  adding  these  two  inequalities  we  obtain 


I P(zx )  -  p( z2 )  il 2  <  <z1  -  z^Pfz^  -  P(z  )>,  V  z^ .z^eH 
By  applying  (4.B.16)  we  obtain 


(4.B.16) 


ii P ( x  +  d*  ±  p.a. )  -  P{x  +  d*)  #2 

<  <±P1-a.  ,PCx  +  dk  ±  p.a.)  -  P(7  +  d£)>. 


(4.B.17) 


Since  <a^  ,z>  =  0  for  all  zeN  ,  ieA  it  follows  from  (4.B.14),  (4.B.15) 

X  X 
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that  the  right  side  of  (4.B.17)  is  zero  and  therefore  eventually 

Pfx  +  d+  ±  p.ai )  =  P(7  +  d*),  Y  ieAy  • 

Since  from  (4.B.12)  we  have  eventually  d^eTf,  it  follows  that  P(7  +  d+)  =  7 

k  k 

and  therefore  also  • 

.—  +  — 

P(x  +  dk  ±  p^)  =  x,  t  i  eA^.. 

r 

Hence  eventually 

+  — + 

dk  ±  p^a^eC  ,  Y  ieA^ 

r 

which  implies  that 

<d*  ±  p.a.,y>  <  0,  Y  ycC,  ieA  .  (4.B.18) 

k  i  i  t 

■v 

Let 

ye{z IzcC  ,  <zad+>  =  0} . 

k  k  n. 

From  (4.B.12)  and  (4.B.18)  we  have  eventually 
<3^  ,y>  =  0  T  i  eAj., 

or  equivalently  yeh^.  Hence  eventually 
R*.  ^{zlzeCk,  <z,dk>  =  0} 
and  it  follows  that 

span  N  =  N  Z>  span{C  O  {zl<z,d+>  =  0}}  =  r 
t  t  k  k  k 

To  show  that  the  reverse  inclusion  note  that  if  yeN^.  then  by 
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Assumption  (4.B)  and  (4.B.12)  we  have 


<y.d£>  =  0. 


Since  N  C  IT  and  eventually  C  =  TT  it  follows  that  eventually 
"X  ic 

yeC^fl  {z I <z ,djc>  =  0}  and  a  fortiori  yespantC^  {z l<z ,djc>  -  0}} 


Therefore  eventually 


N  c  r. 

x  k 


and  the  proof  of  (4.5.8)  is  complete. 

Since  d  is  the  projection  of  -7f(x  )  on  C  {zl<z,d+>  =  0} 
k  k  k  k 

equation  (4.5.9)  follows  rasily  from  (4.5.8). 

Also  from  (4.5.7)  and  (4.5.8)  we  have  eventually  jcex  +  N  ,  <T  eN  , 

K  X  K  X 

dkeNx  and  d*  is  orthogonal  to  N^,  while  by  Lemma  (4.B.2)  the  vector  x^+1  is 
the  projection  of  +  a  (3^  +  d+)  on  7  +  N  .  Therefore  (4.5.10)  and 
(4.5.11)  follow. 

Q.E.D 
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Appendix  (5. A) 


Proof  of  Proposition  (5. A): 

We  prove  the  proposition  inductively.  It  is  easy  to  see  that  the 
Proposition  is  true  for  the  one  dimensional  case.  Assume  the  Proposition 
holds  for  the  (|S|  -l)-dimensiona1  case.  Consider  the  |S|  dimensional  one. 
By  taking  one  entry  equal  zero  und  using  assumption  (5.2.2)  to  ignore  the 
corresponding  equation,  it  can  be  easily  seen  that  we  are  back  at  an  (|S|-1) 
-dimensional  case  with  all  assumptions  holding  and  thus  by  the  induction 
hypothesis  we  get  that 

h(nO  3(rIsI)+)  *  S(R|S|)+  (5.A.1) 

|S|  + 

where  9  denotes  boundary  and  therefore  9(R  )  is  the  boundary  of  the  first 

orthant. 

By  (5.2.2)  and  (5.2.1)  we  get 

IS  I  + 

h(r)e(R  )  7  reft  .  (5.A.2) 

For  all  0  <  6^  <  w  let  fl(S^)  be  defined  by 

n  (6X)  =  { r | i  h(r)  H  <  «  }  (5. A. 3) 

then  by  the  continuity  of  h  and  8  8  and  by  assumption  (5.2.2)  and  since  the 
Jacobian  is  nonsingular  we  get  that 

int  n  (5^  *  $  7  6  >  0  (5. A. 4) 

a  (<$i)c  u  {62)<c  n  -v  62  >  61  >  0 

and  Q  (6^)  is  a  compact  set  for  all  6^  >  0. 

Let  B(0,<5^)  denote  the  closed  ball  of  radius  6  around  0,  then  if 
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(5. A. 5) 


h(Bii(61) )  n  6(0,5^  =  (RiSl)+A  B'(0,61) 

for  all  6 ^>0  we  are  done.  Otherwise  by  (5. A. 2}  there  exists  6 ^>0  such  that 

h{«  (51))AB(0,61)C  (R  I  S  I  )+A  ¥f 0, 6X) .  (5. A. 6) 

Let  v  belong  to  the  R.H.S.  and  not  to  the  L.H.S.  of  (5. A. 6).  By  the 
induction  hypothesis 

~  IS  1  + 
v  t  3(R  ') 

and  by  (5. A. 3) 

v  f  h(r)  "Y  reft  -  15  («  ) 

thus  w.l.o.g.  by  increasing  5  ,  we  have  that 

ve  int{(R*Sl)  f)  ¥(0,«i) } -  (5. A. 7) 

Now,  ft  (6j)  is  compact,  using  the  fact  that  a  continuous  function  maps  a  com¬ 
pact  set  to  a  compact  set  we  get  that  the  set  in  the  L.H.S  of  (5. A. 6)  is 
closed.  Moreover  from  (5. A. 4)  we  have  that  the  intersection  of  the 
interior  of  the  L.H.S  of  {5. A. 6)  and  the  R.H.S  of  (5. A. 6)  is  non-empty. 
Therefore  it  follows  from  {5. A. 7)  that  there  exists  a  point  v*  such  that 

★ 

veah(ft),  (5. A. 8) 

ve  int  (RlS!)  (5. A. 9) 

and  since  the  L.H.S.  of  (5. A. 6)  is  closed  there  exists  a  point  r*e  ft  such 
that 


158 


3 


ft 


v*=  h(r*). 


(5. A. 10) 


Moreover,  (5  A. 9) , (5. A. 10)  combined  with  (5.2.1)  imply  that 


re  int  fi 


(5.A.11) 


Using  (5. A. 11),  (5. A. 10),  and  the  nonsingularity  of  the  Jacobian  P(r  ) 
the  Inverse  Function  Theorem  [39]  precludes  (5. A. 8). 

Q.E.D. 
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